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ABSTRACT 

Semi-analytic models are a powerful tool for studying the formation of galaxies. How- 
ever, these models inevitably involve a significant number of poorly constrained param- 
eters that vaxiat be adjiistcd to provide an acceptable match to the observed universe. 
In this paper, we set out to quantify the degree to which observational data-sets can 
constrain the model parameters. By revealing degeneracies in the parameter space we 
can hope to better understand the key physical processes probed by the data. We 
use novel mathematical techniques to explore the parameter space of the GALFORM 
semi-analytic model. We base our investigation on the Bower et al. 2006 version of 
GALFORM, adopting the same methodology of selecting model parameters based on 
an acceptable match to the local hj and K luminosity functions. Since the GAL- 
FORM model is inherently approximate, we explicitly include a model discrepancy 
term when deciding if a match is acceptable or not. The model contains 16 parame- 
ters that are poorly constrained by our prior understanding of the galaxy formation 
processes and that can plausibly be adjusted between reasonable limits. We investigate 
this parameter space using the Model Emulator technique, constructing a Bayesian 
approximation to the GALFORM model that can be rapidly evaluated at any point in 
parameter space. The emulator returns both an expectation for the GALFORM model 
and an uncertainty which allows us to eliminate regions of parameter space in which 
it is implausible that a GALFORM run would match the luminosity function data. 
By combining successive waves of emulation, we show that only 0.26% of the initial 
volume is of interest for further exploration. However, within this region we show that 
the Bower et al. 2006 model is only one choice from an extended sub-space of model 
parameters that can provide equally acceptable fits to the luminosity function data. 
We explore the geometry of this region and begin to explore the physical connections 
between parameters that are exposed by this analysis. We also consider the impact 
of adding additional observational data to further constrain the parameter space. We 
see that the known tensions existing in the Bower et al. 2006 model lead to a further 
reduction in the successful parameter space. 



1 INTRODUCTION 

Semi-analytic galaxy formation models are a successful tool 
for exploring the physical processes responsible for galaxy 
formation. In essence this technique aims to understand the 
formation of galaxies by breaking the problem down into 
a discrete set of (typically non-linear) differential equations 
describing each physical process. For example, the amount 
of gas able to cool from the halo depends non-lincarly on 
the halo mass and its gas content. These discrete processes 
are then coupled through a set of interactions. For exam- 
ple, the cold gas mass grows as a result of gas accretion and 
cooling and decreases as a result of star formation and gas 
ejection. In simple cases, tlie network of equations can be 
integrated analytically to make quantitative predictions for 
the properties of the galaxy population. In more complex 
cases, the set of equations must be solved numerically, but 



the computational task is still minor compared to integrat- 
ing fundamental physical laws on a particle by particle (or 
cell-by-cell) basis, as required by a fully numerical approach 
(for a few state of the art examples of the fully numerical ap- 
proach see Grain et al. 2009; Schaye et al. 2009; Gnedin et al. 
2009). However, although the description of each individual 
component of the semi-analytic model may appear simple, 
the complex interplay between the components means that 
the outcome of a model is notoriously hard to predict. 

Nevertheless, such models have been very successful 
in defining our current picture of how galaxies form. Ini- 
tial models, such as White & FVenk (1991), Lacey & Silk 
(1991), Kauffmann, White, & Guiderdoni (1993) and Gole 
et al. (1994) showed how the formation of galaxies resulted 
from a competition between gas cooling and accretion, and 
the ejection of gas from galaxies in supernova-driven winds. 
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This type of feedback explained the observed paucity of faint 
galaxies compared to the high abundance of low mass cold 
dark matter haloes. By incorporating these effects into a re- 
alistic model for the growth of dark matter haloes and galax- 
ies, these models were able to make a quantitative connec- 
tion between the assumptions about gas cooling, star for- 
mation, feedback, merging and other physical ingredients, 
and the observed properties of galaxies. Over the past two 
decades, the sophistication of these models has increased, 
allowing them to make predictions for many more observa- 
tional properties such as gala;xy sizes, colours, infrared lu- 
minosities and correlation functions (eg., KaufFmann et al. 
1999; Somcrville & Primack 1999; Cole ct al. 2000; Granato 
et al. 2000, 2004; Baugh et al. 2005; Menci ct al. 2005, 2006; 
Cattaneo et al. 2006; Kang, Jing, & Silk 2006; Monaco 2007). 
At the same time, the improvement in our knowledge of the 
cosmological parameters has tied down some of the major 
uncertainties in the input physical description (eg., Dunk- 
ley et al. 2009). As a result, the comparative power of the 
models has increased. 

A particular issue that has been revealed is the need for 
additional physics to match the sharp break at the bright 
end of the galaxy luminosity function. A rmmbcr of addi- 
tional physical processes have been proposed (c.f. Benson et 
al 2003a) but the currently favored explanation centres on 
an additional feedback channel motivated by observations of 
the interactions between radio galaxies and the surrounding 
IGM in clusters. Although implementations differ, the aim 
of this "radio-mode" feedback is to suppress cooling in the 
most massive haloes leading to the sharp break in the lu- 
minosity function (Bower et al. 2006 [Bow06]; Croton ct al. 
2006; Cattaneo ct al. 2006; Somcrville et al. 2008). An im- 
portant result of implementing this type of feedback in the 
models is that they then predict that much of the star forma- 
tion in the largest galaxies will be completed relatively early 
in the history of the universe, in many cases above redshift 
2. This has largely eased the conflict between observations 
of a large population of passive galaxies at high redshift and 
the tendency for Cold Dark Matter (CDM) models to form 
the largest dark matter structures only recently (Bow06). 

Despite these successes, the semi-analytic technique has 
been criticised for a perceived lack of predictive power. Each 
component of the model must simply encapsulate the physi- 
cal process that it describes. However, since the processes are 
poorly understood, this almost inevitably involves parame- 
tcrising the process in such a way that our limited knowledge 
or understanding can be included by allowing parameters to 
vary between plausible limits. By comparing the model to a 
limited set of observational results, the model can be cali- 
brated and then, with the values of the parameters fixed, the 
model can be tested against additional observational con- 
straints. While the traditional approach, such as that used 
in Cole et al. (2000) or Bow06, is iterate on an intital guess 
to find a single set of parameters that adequately match the 
calibration data, this is clearly a Bayesian problem in which 
we should seek to use the observational data to successively 
constrain the parameter space of acceptable models. 

In this paper we set out to make a systematic explo- 
ration of the parameter space of the Bow06 version of the 
GALFORM model. This contains 16 parameters which can 
reasonably be adjusted over a plausible range. We note ear- 
lier GALFORM models have considered an even larger pa- 



rameter space: for example, the Baugh ct al. (2005) model 
uses a different parameterization for the disk star formation 
timescale, includes a mode of superwind feedback (eg., Ben- 
son et al. 2003a), allows for a different IMF in starbursts 
from that in disk star formation, and uses different descrip- 
tions of gas cooling and gas reheating (cf., Bow06). These 
differences are not considered here — our purpose is to com- 
pare the parameter set identified by Bow06 with the full 
parameter space available in that model. We explore the ef- 
fect of introducing additional physical processes in Benson 
& Bower 2009. 

A variety of strategies for calibrating the model parame- 
ters have been adopted in published semi-analytical models. 
The majority of models have used observational data on se- 
lected galaxy properties at « = to choose a "best fit" set of 
parameters, and have then made predictions for higher red- 
shifts, but some models have also supplemented the z = 
constraints with observational data on higli-rcdsliift galax- 
ies when choosing the "best fit" model parameters. Different 
authors have made different choices as to what is the best 
set of z = properties to use in setting the model param- 
eters. For example, Kauffmann et al. (1993), Kauffmann et 
al. (1999), Somcrville & Primack (1999) and Dc Lucia et al. 
(2004) used the normalization of the TuUy-Fisher relation 
and the gas masses of Milky Way-like galaxies as their pri- 
mary observational constraints. On the other hand. Cole et 
al. (1994), Cole et al. (2000), Nagashima et al. (2001), Kang 
ct al. (2005), Baugh et al. (2005) and Bow06 all used the 
galaxy optical and near-lR luminosity functions as their pri- 
mary constraints. In addition, most models have used addi- 
tional z — properties beyond their "primary" constraint in 
choosing best-fit parameters. For example. Cole et al.(2000) 
used the gas fractions and sizes of galaxy disks, together 
with the ratio of early to late morphological types and the 
stellar metallicities of elliptical galaxies, in addition to the 
B- and i^-band luminosity functions. In contrast to this, 
Benson ct al. 2003a and Bow06 chose to focus on obtaining 
a good match to the z = B- and i^-band luminosity func- 
tions (but it is important to note that the starting point for 
iteration was taken from the Cole et al. 2000 model). Sev- 
eral subsequent papers have explored the performance of 
the Bow06 model with respect to additional data sets (eg.. 
Bower et al. 2008; Font ct al. 2008; Gonzalez-Perez et al. 
2008; Seek Kim et al. 2009; Gonzalez et al. 2009). 

These different strategies for calibrating the model pa- 
rameters have different advantages and drawbacks. The 
Bow06 approach has the advantage of simplicity, and that 
the z = luminosity functions are measured accurately and 
(largely) free from observational selection effects. Further- 
more, the model outputs do not require a highly complex 
layer of additional processing to cast them into the observed 
quantities (of course population synthesis models are still 
required). A disadvantage is that the present-day optical 
and ncar-lR luminosity functions arc relatively insensitive 
to some model parameters, such as those controlling the star 
formation timescale (e.g. Cole et al. 2000). For this reason, it 
is helpful to introduce additional observational constraints 
to which these other model parameters are more sensitive. 
For example. Cole ct al. 2000 found that the gas mass vs 
luminosity relation for disk galaxies provides very good con- 
straints on the model parameters for star formation. Poten- 
tial drawbacks of introducing extra observational constraints 
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beyond the z — luminosity functions are that they may 
be less accurately determined obscrvationally, and that a 
subjective decision is required to assign relative weights to 
the different observational constraints. In addition, if all the 
available data-sets arc used to constrain the model, no ob- 
servations will be immediately available to independently 
test its validity. This is a deep philosophical issue that we 
will not tackle here, but clearly we should seek a strategy in 
which the physical role of each constraint is clear. 

Thus, given the wide variety of observational data that 
could be used to constrain semi-analytical models, each with 
their own random and systematic errors, what is needed is 
some more objective procedures for evaluating what is the 
range of model parameters consistent with a particular com- 
bination of observational constraints, and what is the effect 
on this range of adding or removing a particular observa- 
tional constraint. In this way, we hope to end up with an 
objective measure of how robust different predictions from 
the model are, including how sensitive they are to includ- 
ing different model ingredients and different observational 
constraints. 

This paper is a first step in this program. We intro- 
duce a new method of exploring the model parameter space 
to identify those regions that produce acceptable matches 
to the observational data. For simplicity, in this paper we 
follow the approach of Bow06 and use only the bj and K- 
band z = luminosity functions to directly constrain the 
acceptable regions of parameter space. Once we have identi- 
fied these regions, we briefly examine the performance of the 
model with respect to additional z = data sets, but these 
are not used to define the initial search criterion. Further- 
more, we use the same version of GALFORM as in Bow06, 
making it simple to compare the unique parameter set pre- 
sented in Bow06 with the full parameter space that we iden- 
tify in our search here. Since the Bow06 model is imple- 
mented on the Millennium N-body simulation (Springel et 
al. 2005), we adopt the same fixed cosmological parameter 
set. In principle, the methods we present here could be ex- 
tended to allow the cosmological background model to vary. 

Our investigation aims to address some key questions: 
How large is the range of parameter space that produces 
acceptable fits? Is the parameter set selected in Bow06 in 
some sense typical or optimal? It is unlikely that there is 
a single "best value". Given the relatively large number of 
model parameters, there will be a range of parameter val- 
ues giving acceptable fits. Moreover, we should be careful to 
define what we mean by an "acceptable" fit. Since the GAL- 
FORM model is only an approximation to reality, we would 
not expect the model to exactly rei)ro(iucc all the observa- 
tional data, even if the model's parameters were set to their 
"best" values. The Bayesian approach we adopt requires us 
to formalize this uncertainty by introducing a "model dis- 
crepancy" term {emd) into our comparison with the data. 
This has the effect of ensuring that we do not reject a region 
of parameter space if the comparison with the data is suffi- 
ciently good that future improvements in the model (which 
reduce the degree of approximation) may result in improved 
agreement with the data (in that region). This approach is 
fundamentally different from simply requiring that we find 
the region of agreement within the observational uncertain- 
ties — it recognises that the model is itself approximate. 
Ignoring emd will lead us to focus on an unjustifiably nar- 



row region of parameter space. In this case, reducing the 
level of approximation in the model would cause new re- 
gions of acceptable parameter space to appear in areas that 
were previously deemed implausible. 

Of course, estimation of emd is uncertain. In principle, 
one could hope to arrive at a value by tracking changes to 
the model as the level of approximation is reduced. This 
approach is an active subject in the statistical litterature 
(eg. Goldstein & Rougier, 2009), but the methods are not 
yet suitable for application to GALFORM. Instead, we ad- 
dressed the model discrepancy term by constructing a series 
of test luminosity functions and asking ourselves whether 
we would comfortably reject the corresponding region of pa- 
rameter space on the basis of the comparison and our pre- 
vious experience of imjjrovements to the GALFORM code. 
Reassuringly, our estimate of emd results in the Bow06 be- 
ing marginally acceptable. Thus the model discrepancy is 
consistent with our aim of searching parameter space for 
parameter sets that perform comparably to (or better than) 
Bow06. 

Our task is therefore to evaluate the GALFORM model 
over the input parameter space, identifying the portion of 
this space for which fits to the local luminosity functions 
are acceptable. Unfortunately, the 16-dimensional parame- 
ter space (that is introduced below) is extremely large. Di- 
viding each axis into (just) 5 values and exploring all possi- 
ble parameter combinations would require 10^^ evaluations 
of the model (and hence require computing time in excess 
of 10* cpu-years). Even if this were possible, the resulting 
grid would be of such low resolution that it would give lit- 
tle indication of the GALFORM parameter space. Clearly a 
much better targeted strategy can be devised. 

In this paper, we use the "model emulator" technique 
(eg., Craig et al. 1997; Kennedy & O'Hagan 2001; Vernon et 
al. 2010) to explore the parameter space. This technique has 
been specifically developed within the statistics community 
in order to analyze models that possess high-dimensional pa- 
rameter spaces. It involves constructing a stochastic model 
that emulates the output of the GALFORM model. The em- 
ulator is constructed so as to reproduce the results of known 
runs and statistically interpolate between them taking into 
account the appropriate correlation length of the model. At 
each new point, the model provides an expectation value 
for the outcome of a GALFORM evaluation and a variance 
refiecting the degree of uncertainty in the emulator output. 
An evaluation of the emulator is of order 10^ times faster 
than an evaluation of the full model, and the emulator can 
therefore be used to eliminate regions of parameter space 
for which it is implausible that an evaluation of GALFORM 
will result in an acceptable match to the observational data. 
By proceeding in waves of emulation, we successively reduce 
the volume that must be investigated at each level until the 
volume that must be directly evaluated is a tiny fraction 
(less than 0.3%) of the original parameter space. The pri- 
mary advantage of the errmlator is its speed, which allows 
us to investigate the full parameter space efficiently and re- 
stricts time-consuming evaluations of the GALFORM model 
to regions of parameter space where the outcome cannot be 
predicted with sufficient accuracy by the emulator. Com- 
bining the emulator method with an efficient strategy for 
sparsely sampling the parameter space, we can explore the 
parameter space of galaxy formation with around a month 
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of CPU time. These techniques are gaining widespread ac- 
ceptance in the chmate research community where full eval- 
uations of the computer model are prohibitively expensive. 
The parallel with the galaxy formation problem is powerful 
and illustrative (Vernon et al. 2010). Our work is also closely 
related to studies of the galaxy formation parameter space 
that are based on exploration with Monte Carlo Markov 
chain techniques (Henriques et al. 2009, Kampakoglou et al. 
2008). These methods currently consider lower dimension- 
ality than we address here, and it should be noted that, in 
general, MCMC techniques may face problems when dealing 
with high-dimensional input spaces. We should also stress 
that the dimensionality of the problem that we consider 
here is likely to greatly increase as additional physical pro- 
cesses are included in the model. Indeed, this has been 
one of the major motivations for the development of the 
emulation techniques presented here (Oakley & O'Hagan 
2004). For a summary of state-of-the-art emulation tech- 
niques see the Managing Uncertainty for Complex Models 



website http://mucm.group.shef.ac.uk/index.htmI 



The emulator process identifies a small fraction of the 
total input space as generating acceptable luminosity func- 
tions. The geometry and extent of the region is, however, 
hard to comprehend. As we will see, some parameters (or 
parameter combinations) are poorly constrained: this can be 
viewed as telling us that these have little role in determining 
certain observable properties of galaxies. Conversely, some 
parameter combinations are tightly constrained: these play 
a critical role, and we can hope to use this to understand 
more about the interplay of the components in the GAL- 
FORM model, and thus to better understand the physics 
underlying the galaxy formation process. 

As we have already stressed, this paper concentrates 
on the Bow06 version of the GALFORM code. Future pa- 
pers will explore the much larger parameter space created 
by recent updates to the code, introducing new physical pro- 
cesses to the problem, such as better treatment of angular 
momentum, a physical description of ram pressure stripping 
(cf., Font et al. 2008), AGN heating of halo gas (Bower et al. 
2008), and a variable stellar IMF (Baugh et al. 2005). We 
also extend our new parameter search technique to use a 
wider range of calibration data from the outset (cf., Benson 
& Bower 2009). 

The paper is laid out as follows. In §2, we provide a 
brief overview of the GALFORM code, and outline the phys- 
ical meaning of the parameters that we vary in this project. 
In §3, we describe the model emulator technique on which 
our parameter exploration is based. §4 presents the main 
results, §4.1 focussing on our success in emulating the lumi- 
nosity function and its dependence on the model's param- 
eters. Although it is not the primary focus of the paper, 
it is obviously of interest to see whether additional data 
sets break the degeneracies evident in the luminosity func- 
tion comparison. In §4.2, we briefly investigate the role of 
additional datasets. In §5, we examine the physical impli- 
cations of these results using a PCA to identify important 
combinations of the input parameters. Finally, we present 
a discussion of our work in §6 and briefly summarise our 
main conclusions in §7. Throughout, we adopt a cosmology 
in which = 0.045, Qm = 0.25, A = 0.75 and aa = 0.9 at 
the present day. The model assumes Ho = 73kms~^ Mpc~^, 



although we quote luminosities and space densities in term 
of ft = iJo/lOOkms"^ Mpc"^ 



2 PARAMETERS OF THE GALFORM CODE 

The GALFORM code contains many parameters. In Tablejl] 
we list the parameters used in the Bow06 version of the code, 
together with the range of plausible values considered in our 
analysis. We have grouped the parameters by the physical 
processes that they are associated with. Below, we briefly 
describe the parameters. For a full description, we refer the 
reader to Cole et al. (2000), Baugh et al. (2005) and Bow06. 

The flrst set of parameters are associated with star for- 
mation: determines the normalisation of the star forma- 
tion efficiency, while a* determines its dependence on the 
disk circular speed: 

SFR = (K,disk/200kms-')""* (A/coid/rdyn,disk) (1) 

where SFR is the star formation rate, Mcow the mass of cold 
gas in the galaxy, K,disk is the circular velocity of the disk 
and Tdyn.disk its dynamical time. We calculate chemical en- 
richment using the instantaneous recycling approximation, 
so the rate of ejection of newly synthesized metals into the 
ISM is given by 



Mz.cj = PyicldSFR 



(2) 



where Pyidd is the yield of metals, which depends on the 
IMF. For consistency with Bow06, we use a Kennicutt (1983) 
IMF throughout, but treat Pyiow as an adjustable parameter. 
In Font et al. (2008) and Bower et al. (2008), we showed that 
the match to the observed colours of galaxies was improved 
by adopting a higher yield (0.04) than the standard value 
(0.02). 

The second group of parameters are associated with the 
supernova-driven feedback: Vhot.disk and Vhot, burst control 
the normalization of feedback in quiescent star formation 
and bursts respectively; Qhot controls the dependence of the 
feedback on the circular velocity. For example, the rate at 
which mass is returned from the cold phase to the halo dur- 
ing quiescent star formation is given by 



Moutfl 



SFR 



K. 



14o 



(3) 



Cold gas that is ejected from the disk becomes available to 
cool and form further stars after a factor Ofrohcat"'^ times 
the halo dynamical time. In low mass haloes cooling is very 
rapid, and this parameter plays a key role in setting the disk 
fueling rate. 

AGN feedback is controlled by the parameters Ocooi, 
which effectively determines the halo mass at which this 
form of feedback becomes effective, and CEdtQ which con- 
trols the maximum energy output possible for a central su- 
permassive black hole of given Eddington luminosity LEdd. 
Speciflcally, we only allow the AGN to regulate cooling if 



^cool(^cool) ^ Q^cool ^ff(^cool) 



(4) 



^ Note that due to an error in Bow06, cooling luminosities were 
over estimated by a factor An. Thus, while the paper quotes the 
efficiency parameter esMBH 0.5, this should have been 0.5/47r = 
0.04. With this correction the results of Bow06 are unchanged. 
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and 



(5) 



where Lcooi is the radiative coohng luminosity of the halo 
gas. Note that larger values of cvcooi result in AGN feedback 
being effective in lower mass haloes. 

Galaxy mergers are dependent on the rate of decay of 
satellite orbits due to dynamical friction and on the mass 
ratio of the merging objects. The normalisation of the or- 
bital decay rate is set by /df (see Cole et al. 2000), while 
/oiiip and /burst are respectively the mass ratios needed to 
transform the morphology of the main galaxy and to cause 
a burst of star formation (see Baugh et al. 2005 and Malbon 
et al. 2007^1^ The disk stability parameter, /stab, sets the 
self-gravity threshold at which galaxy disks become unsta- 
ble to bar modes (see Bow06). This instability causes the 
cold disk gas to be consumed in a burst of star formation. 
Smaller values of this parameter make disks more prone to 
bar instabilities. 

Finally, the parameters Vcut and Zcut encapsulate the 
effect of reionisation on cooling in small haloes. For further 
discussion of this approximation, see Benson et al. (2003b). 
We will show that these parameters have little impact on 
the galaxy properties we consider here. 

We hst in Table[l]the GALFORM parameters which we 
allow to vary in our parameter space exploration, together 
with their values in the Bow06 model and the ranges over 
which we allow them to vary. Ideally we would know in ad- 
vance what range for each parameter is physically meaning- 
ful or interesting, but this is only possible for a subset of the 
parameters. For example, the parameters /eiup and /burst 
are constrained to lie in the range [0,1] by the way they 
are defined, and numerical simulations of merging galaxies 
constrain their values to an even narrower range. Similar ar- 
guements can be applied to restrict the range of pyidd, tEdd, 
/df , /stab , «cut and Zcut. On the other hand, theory does not 
currently provide any useful guide as to the value of e*, so 
the value of this parameter is set purely by comparison with 
observations and previous experience with GALFORM. In 
these cases, we selected the range by posing the question 
"if an acceptable model was found outside this range, would 
it be interesting?" . We answered no if the parameter value 
seemed inconsistent with the physical model that component 
of the code was intended to describe. The range selected is 
intended to be conservatively large, but is inevitably subjec- 
tive. In some cases the parameter value adopted in Bow06 
is uncomfortably high (e.g. the 14ot, burst and a^ot param- 
eters are in principle constrained by the amount of energy 
available from supernova explosions) and we deliberately ex- 
tended the search range in order to bracket the value from 
Bow06. 

In the following analysis, it is often helpful to use scaled 
variables so that each parameter covers the range ±1. We 
denote scaled variables by a (etc) where 



2 ('-^max Q^min) 



(6) 



^ We note, however, that the parameter /gas. burst is set to 0.1 in 
this study, and in Bow06, so that almost all sufficiently high mass 
ratio mergers result in a burst of star formation. In Malbon et al. 
this parameter was set to 0.75. 
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0.8 
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/burst 
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0.01 
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0.005 
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A 


Reionisation 


Vcut 


50 


20 
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Zcut 


6 


6 
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Table 1 . The parameters allowed to vary in our parameter space 
exploration. The first column provides an indication of the phys- 
ical process associated with the parameter. The second column 
gives the parameter name; column 3 gives the value of the pa- 
rameter used in Bow06; and columns 4 and 5 the range of the 
parameter explored in this paper. Active variables in the model 
emulator (those that are important in capturing the behaviour of 
the 2 = luminosity function output of the model, see §3.5.2) are 
indicated in column 6. 



The end result is that the model spans a 16 dimen- 
sional parameter space. However, it is extremely important 
to stress that several of these parameters have little impact 
on the GALFORM output for the selected observables, and 
thus that it is initially possible for the emulator to capture 
the behaviour of the model using many fewer parameters. 
Our first step was to identify the most important parame- 
ters whose values were key to matching the selected galaxy 
properties. In terms of the match to the bj and K luminos- 
ity functions, there are 10 active parameters (at Wave 4, see 



[ 3.5.2 I that drive the majority of the variation in model out- 
puts. These are indicated by an A in column 4 of Tablejl] As 
we will show, the parameter space of acceptable models is 
limited to a very small fraction of this volume. Even though 
adequate fits can be obtained for a wide range of parameter 
values, variations in parameters must be carefully traded off 
to keep the input parameter set on a narrow hypersurface. 

Note, however, that a parameter that is inactive when 
the model is constrained using the luminosity function data 
may play an important role in fitting other data sets. For ex- 
ample, while the reionisation parameters Vcut and Zcut have 
little effect on the global luminosity function (within the 
limits considered), these parameters play a key role in de- 
termining the satellite galaxy population of the Milky Way 
(Benson et al. 2003b). 



3 THE MODEL EMULATOR TECHNIQUE 

3.1 Bayesian Analysis of Computer Models 

There has been much interest in the statistics community in 
developing techniques to help understand and analyse com- 
plex computer simulations of real world processes, referred 
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Figure 1. Two examples of Latin Hypercube designs for a two dimensional parameter space. The range of each input is divided into n 
intervals where n is the number of points. Note that only one point is placed in each of the n intervals over Input 1 and Input 2, and 
that these points have been placed at random within each interval. The two panels show samplings with 8 and 20 points. The Maximin 
strategy adopted in this paper ensures that the points are evenly spread throughout the region that we wish to sample. 



to generically as Computer Models (Currin et al. 1991, Sat- 
ner 2003, Craig et al. 1997, O'Haggan 2006). Such models, 
of which GALFORM is an example, generally take a sig- 
nificant time to run and require the specification of a large 
number of input parameters. They involve several distinct 
sources of uncertainty, all of which need to be assessed and 
combined in a unified analysis. These fall into 5 basic types: 

[1] Parameter uncertainty: We do not know the ap- 
propriate values of the inputs to the simulator, and want to 
identify the class of inputs that give acceptable matches to 
the observed data. 

[2] Simulator uncertainty: Due to the significant run 
time we cannot hope to cover the input space with a suitably 
large number of model evaluations. Therefore, we will be 
uncertain as to the output of the model for regions of the 
input space where no evaluations have been performed. This 
uncertainty is handled through the use of an emulator as 
described in j ]3.2| 

[3] Structural uncertainty: This aspect, which is less 
familiar to the astronomical community, refers to the funda- 
mental problem that, however carefully the model has been 
constructed, there will always be a difference between the 
system (in this case the Universe) and the simulator. Simpli- 
fications in the physics, based on features that are too com- 
plicated for us to include, and simplifications and approx- 
imations in solving the equations determining the system, 
lead to a discrepancy between the model and the system. 
We represent this through use of the "model discrepancy" 
term described in §3.6| 

[4] Observational error: We do not know the prop- 
erties of the real Universe exactly, but instead have obser- 
vational measurements with corresponding errors. 

[5] Initial condition and forcing function uncer- 
tainty: Most Computer Models require the specification of 
initial conditions and/or forcing functions, the form of which 
is most likely uncertain. 

In order to analyse the input space of the GALFORM 
model, and to determine which inputs are of interest, we 



need to address all of the above five sources of uncertainty 
in a unified manner. A Bayesian approach provides a natu- 
ral framework for such an analysis. Powerful Bayesian tech- 
niques, centered around the idea of emulation, have been 
developed in the Statistics community for such problems, 
and have been successfully applied to models in several sci- 
entific disciplines (eg. Kennedy & O'Hagan 2001, Oakley 
2002, Higdon 2004, O'Hagan 2006, Schneider et al. 2008, 
Heitmann et al. 2009). However, employing a fully proba- 
bilistic Bayesian analysis (where every uncertain quantity 
is assigned a probability distribution) is often unnecessarily 
challenging and involves specifying prior distributions that 
are in some cases difficult to justify. Instead we employ the 
Bayes Linear approach (Goldstein & Woof 2007) which is 
a more tractable version of Bayesian analysis that requires 
fewer assumptions, and that deals with only expectations 



and variances of all uncertain quantities (see ^ 3.5.4 1. The 



Bayes Linear methods presented here have been successfully 
applied to several complex models including Oil Reservoir 
Models and Climate Models (Craig et al. 1997, Goldstein et 
al. 2009), and are well suited to the case of high-dimensional 
models. 



3.2 General Emulation Strategy 

An emulator is a stochastic function that represents our be- 
liefs about the behaviour of a deterministic function at in- 
put settings that have yet to be evaluated. Representing a 
model such as GALFORM as a function that maps a vec- 
tor of inputs X to a vector of outputs f(x), an emulator 
would give, for each input parameter setting x, quantities 
such as an expectation and variance of the function: E(f (x)) 
and Var(f(x)). In this way it represents the expected value 
of the function at x but also gives a measure of our un- 
certainty at this point through Var(f(x)). This uncertainty 
would be small at points close to known model runs, and 
large at points far from known runs. The expectation of the 
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emulator will (in most cases, but not always) interpolate the 
outputs of evaluated runs. 

Emulators have many advantages, the most important 
being their speed: in many cases an emulator will be many 
orders of magnitude faster than the model it represents. 
Also, emulators are designed to cope with high numbers of 
input dimensions, far more than can be handled by more 
traditional methods such as Monte Carlo Markov Chains 
(eg. Heitmann et al. 2009). 

Here we use emulation techniques to identify the set 
of all inputs x that will give rise to an acceptable match 
(with respect to all relevant uncertainties) between the bj 
and K luminosity function outputs of GALFORM and the 
corresponding observational data (Norberg et al. 2002 for 
the bj luminosity function and Cole et al. 2001 for the K 
band) . 

The general strategy is as follows. Initially we design 
a suitable set of 1000 runs of the GALFORM model cho- 
sen to be at parameter locations that will cover the input 
space efficiently, and help the construction of an acceptable 
emulator. Then we identify a subset of 11 outputs (ie., the 
predicted values of the luminosity function at selected mag- 
nitudes) which are representative of the bj and K luminos- 
ity functions, informative (about which regions of the input 
space are unacceptable) and that are also straightforward 
to emulate. We emulate each output by fitting a third or- 
der polynomial (defined over the input space) to each of the 
11 outputs, and then modeling the residuals of this fit as 
a Gaussian process. Using the emulators and assessments 



of all other relevant uncertainties (as described in f3.ll we 



then construct an Implausibility Measure defined over the 



input space (see [3.7 \. Regions of the input space that have 



a high Implausibility Measure are deemed highly unlikely to 
give an acceptable match between the luminosity function 
output of GALFORM and the observed data and are hence 
discarded from further analysis. This defines a reduced re- 
gion of input space that we can explore further. We employ 
an iterative approach, and have reduced the input space in 
four stages as is described below. 

3.3 Designing the First Set of Runs 

Determining a highly informative collection of points in in- 
put space to perform evaluations of a Computer Model such 
as GALFORM is an important task. The points must be 
space filling in addition to avoiding repeated runs at similar 
values of one or more of the inputs (as occurs regularly in 
a standard grid design). Maximin Latin Hypercube (Stein 
1987) designs fulfill both these properties and were used to 
generate the initial set of runs. These designs are also ap- 
proximately orthogonal: a desirable property when trying to 
fit polynomials to a function, as is the case when building 
an emulator. To construct a Latin Hypercube of n points, 
the range of each of the inputs must be divided into n equal 
intervals; the points are then chosen randomly so that no 
two points occupy the same interval for any of the inputs. 
Examples of 2-dimensional 8 and 20 point Latin Hypercube 
designs are shown in Fig. [T] A Maximin Latin Hypercube is 
constructed by creating a large number of Latin Hypercube 
designs, and then choosing the one that has the largest mini- 
mum distance between any pair of points within that design. 
For each design, we generated 2000 hypercubes and then se- 



lected the best one using the maximim criterion. One thou- 
sand such runs of the GALFORM model were performed 
based on such a Maximin Latin Hypercube design, and these 
runs form the basis of Wave 1 of our analysis. 



3.4 Choosing Outputs 

Once the 1000 runs were completed, 11 outputs (ie., the val- 
ues of the luminosity function at selected magnitude points) 
were chosen for emulation: 6 from the bj luminosity and 5 
from the K luminosity functions. These are shown as the ver- 
tical black dashed lines in Fig. [2] along with the full outputs 
from the 1000 runs and the observed data (the error bars 
contain all relevant uncertainties as discussed below) . These 
particular 11 magnitude outputs were chosen as they repre- 
sented the form of the luminosity functions well (and hence 
can be used to reconstruct the luminosity function), they 
were easy to emulate and, most importantly, were also sen- 
sitive to changes in the input parameters implying that they 
are very informative with regards to the input space. This 
last point implies that we can reliably cut out regions of the 
input space using only these 11 outputs, and without being 
forced into emulating the luminosity function in every lumi- 
nosity bin. Analysis of the initial runs, showed that adding 
additional outputs did not significantly improve the char- 
acterisation of the luminosity function, but did risk weight- 
ing implausibility measures too much towards the faint-end 
perfromance of a model. We also note that we did not at- 
tempt to emulate the luminosity function for bj < 16 mag 
or _K' < 20 mag because the limited resolution of the Mil- 
lennium simulation becomes important for some parameter 
values in this region. 



3.5 Constructing the Emulator 

3.5.1 A Simple Example 

Before we describe the construction of the emulator for the 
GALFORM model, it is useful to briefly outline how the 
method might be applied to a simple one-dimensional prob- 
lem. 

The first step is to construct an emulator of the simple 
1-dimensional function shown in Fig. [3] Imagine that the 
function, f{x), is a one parameter model for some measur- 
able quantity. In the left-hand panels, the function (which is 
in fact a simple sine wave) has been evaluated at n = 6 in- 
put points denoted Xi. We use the function output at these 
points, ki — f{xi), to construct an emulator based on a 
random Gaussian process, u{x), that is we say: 



u{x) 



(7) 



where we assume the prior expectation and variance of the 
process u{x) to be 'Ei{u{x)) = and Var(M(a::)) = , and 
that the prior covariance structure is defined to be of Gaus- 
sian form with correlation length 9: 

c{x, x) = Coy{u{x),u{x)) — exp(— (a; — x')^ /O^). (8) 

We can now update the emulator u{x) using knowledge of 
the six evaluations of f(x). The updated emulator at a new 
point x' now has expectation and variance given by: 

E[u(a:')] = t{xYA-^]i (9) 



8 Bower et al. 




Mn,-5log(h) 




MK-5log(h) 



Figure 2. The bj and K luminosity functions from the first 1000 
runs of the model (Wave 1) compared to observational data. The 
data, from Norberg et al. (2002) and Cole et al. (2001) respec- 
tively, are shown as black points with 2cr error bars which in- 
clude all observational and model discrepancy uncertainties as 
described in §3.6| Parameter values were chosen using a Maximin 
Latin Hypercube design spanning the 8 most important param- 
eters (see Table ^ . The vertical dashed black lines show the 11 
outputs chosen for emulation. These provide a good characteri- 
sation of the luminosity function. Note that below fej = 16, some 
luminosity function calculations are affected by numerical resolu- 
tion. The colouring of lines indicates the quality of the match to 
the observed data, with blue colours indicating < 16. 



Var[u(3::') 



-t{x'fA-h{x') 



(10) 



where k = {f{xi),f{x2),...f{xn))'^ (the vector of known 
function values), t(a;') — (c(a;', a;i), c(a;', 0:2), c(a;', a;„))^ 
(the column vector of covariances between the new and 
known points) and A is an n x n matrix with elements 
Aij = c{xi,Xj) (the matrix of covariances between known 
points, eg. Williams 2002). In a fully probabilistic analysis, 
equations (|9| and | |10[ ) would be derived from conditioning 
a Gaussian Process on the 6 known evaluations, and would 
give the mean and variance of the corresponding Normal 
distribution of u{x') at the input point x' . However, here we 
use a Bayes Linear analysis where equations ^ and ( 10 1 are 
derived directly from the Bayes Linear update (described in 
^3.5.4| and given by equations | |15[ ) and ( |17[ )), and are consid- 
ered as primitive quantities which are used directly to assess 
whether parts of the input space are acceptable. 

The random process u{x') quantifies the uncertainty in 
this fit: close to points at which the function has been eval- 



uated the uncertainty is small, while between points it is 
larger. Making a suitable choice for a and 6 is problem spe- 
cific. In this example, we set a to be 0.3 and chose 6 to be 
1/5 of the range of the input variable x. 

Now we suppose that we have some measurement for 
the quantity being modelled. This is shown by the horizon- 
tal black lines (thin lines indicating the measurement un- 
certainty) in the figure. Using our emulator, we now try to 
identify the parameter values at which an evaluation of the 
model might be compatible with measured data. While our 
emulator cannot guarantee that an evaluation will success- 
fully match the data, it identifies the regions at which a 
match is implausible. This is quantified through the use of 
an implausibility function, I{x), which is discussed in more 



detail in [3.7 In this simple example I{x) is defined as: 



I\x) = |E(/(a;)) - z|V(Var(/(a;)) + Var(eo,,)), (11) 

where z — —0.8 is the observation (the middle horizontal 
black line), Var(eoi,s) is the variance of the observational 
errors which in this case were taken to be 0.03^, and E(/(a;)) 
and Var(/(a;)) are given by equations ([9|, (10 1 and ([7|. 

The value of I{x) is shown in the lower left-hand panel 
of Fig. [3] where I(x) is large we reject the parameter val- 
ues from further investigation. However, where I{x) is below 
our cut-off value (in this case 3), we perform a small num- 
ber of additional evaluations of /. We then re-emulate using 
these additional evaluations, and this is shown in the right- 
hand panels. Now the uncertainty in the critical region is 
much reduced and the only "non-implausible" regions are 
closely centered on the regions where / truely matches the 
measurement. This iterative approach is known as "history 
matching" and is explained in more detail below, but it can 
be see that the technique allows us to focus our evaluations 
of / on the regions where additional knowledge is critical. 

Obviously, this one dimensional example is highly sim- 
plified, and an emulator is not required to solve this problem. 
In our GALFORM application we are aiming to search a 
much higher dimensional parameter space and the ability 
to quantify our knowledge of the model is critical. Further, 
evaluation of the emulator is almost 10^ times faster than 
running an individual model, and thus this technique can 
be combined with the Latin Hypercube scheme to explore 
a seemingly vast parameter space. Note that for the simple 
example we have used an emulator consisting of purely a 
Gaussian Process, given by equation ([7|. When we come to 
emulate the GALFORM model we will use a more advanced 
emulator that contains polynomial regression terms, a Gaus- 
sian Process to model the residuals of the regression and a 
white noise term to model ineffective variables. In the sec- 
tions below, we describe the construction of the GALFORM 
emulator in more detail. 



3.5.2 The GALFORM emulator 

Here we describe the construction of the emulators for each 
of the 11 outputs identified in §3.3. It must be stressed that 
although we provide some detail, the construction of the 
emulators and the subsequent analysis involve an extensive 
collection of statistical techniques, and we cannot hope to 
give a comprehensive treatment here. A much more exten- 
sive description is given in Vernon et al. 2010. For examples 
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Emulator of Model Oulpul Emulotor of Model Output 




hpul Parameter X Input Parameter x 



Figure 3. An example of the emulation of a 1-dimensional function (a sine wave). The top left panel shows the emulator f{x) = u(x) 
after six evaluations of the function: the blue line is the E(/(a;)), the two red lines define a credible interval of E(/(a;)) it 2^Var(/(a;)) 
and the black dots are the 6 model outputs. Observational data z is represented as the middle black line, with 2 sigma errors given by 
the top and bottom black line. The bottom left panel shows the implausibility function I{x) in black, with the cutoff of 3 in green. Inputs 
along the x-axis are deemed implausible if I{x) > 3. The top right and bottom right panel show the situation after three more runs have 
been performed in the non-implausible region. We re-emulate, and now, as the emulator is far more accurate, the implausibility naturally 
gives the two regions of the x-axis where the function matches the observed data (approximately around a:: = 33 and x = 43). 



of such techniques used in other applications see Craig 1996, 
Goldstein & Rougier 2006. 

We view the GALFORM model as a function that maps 
the 16 inputs in Table [T] to the 11 identified outputs, and 
denote it by f(x) where f is an 11 component vector of 
outputs and x a 16 component vector of inputs with x — 

{X1,X2,X3, •■•) = (Vhot,disk, Clreheat, Ocool, ■••)• (NotC that WC 

use the scaled variables directly, without taking logs even 
when the variables cover a large range.) Great simplifications 
can be made in the construction of emulators through the 
use of active variables. Often, a subset of the inputs have 
strong effects on the outputs that are to be emulated and we 
call these the Active Variables (see Table [TJ. We define xa 
to be a vector composed of active variables only and model 
their effects on f(x) in detail (Craig 1996). The remaining 
inputs (the inactive variables) have only minor effects on the 
outputs so are treated as contributing a noise process to the 
emulator. The form of the emulator for component i of f (x) 
would then be: 

/i(x) = 9iii^A) + Iti(xA) + lUi(x). (12) 

j 

Here the gij{:KA) are known functions chosen to be first, sec- 
ond or third order polynomial terms in the active variables 
(for example, for output i = 1 we might have terms of the 
form gij{xA) = X1X2, x^ or X1X2X3, with different terms 
corresponding to different values of j); the Pij are coeffi- 
cients of the polynomial which will be fitted using regression 



methods. Ui{xA) is a Gaussian Proces^which also depends 
only on the active variables. The effects of the inactive pa- 
rameters are described by the Wi(x) term, referred to as a 
nugget, which is modelled as a random white noise process. 
The regression term Pij gij (xa) on the right hand side of 
equation \12\ is included to capture the global behaviour of 
the GALFORM function. The Gaussian process m(xa) rep- 
resents localised deviations from this global behaviour, and a 
simple specification is to suppose, for each x, that Ui{x) has 
zero mean, constant variance and Coy {ui{x),Ui{x')) which 
is a function of ||a; — here chosen to be of Gaussian form 
(see equation ( |13[ ) below) . As we perform evaluations of the 
model, the expectation and variance of fi (x) at a given point 
is then updated using the Bayes Linear analysis as described 
in §3.5.4. 

The above describes the general structure for all the em- 
ulators used in this analysis. As we perform the reduction of 
input space iteratively, and at each iteration (or "wave" ) we 
re-emulate changing the specific form for the emulators (see 
the section on History Matching below, §3.8). At each wave 
the number of active variables increases as the emulator be- 
comes more accurate, and the random processes ^^(xa) and 
uii(x) become less significant. 



^ Technically we should refer to ^^(xa) as a weakly stationary 
random process in the Bayes Linear context, as we are making 
no assertions regarding the behaviour of higher order moments of 
ni(xA). 
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Table 2. The 8 candidate active variables for Wave 1 bj luminos- 
ity function output (see Fig. [2j, with the final 5 active variables 
for each bj output marked by an x. The adjusted R? value gives a 
measure of how effective the 3rd order polynomial is in capturing 
the global behaviour of the particular bj luminosity GALFORM 
output. Note the low adjusted of outputs 3 and 4: these were 
not used in the first wave analysis, but did feature in later waves. 



3.5.3 The Wave 1 Emulator. 

Here we outline the construction of the Wave 1 emulators 
(full details, and extensive discussion, are given in Vernon et 
al. 2010). First, 8 of the 16 inputs were chosen as candidate 
active variables due to their clear effect on the luminosity 
output in a set of initial test runs. In choosing the active 
variables, the aim is to explain a large amount of the variance 
of /i(x) using as few variables as possible. Initially, we ran 
GALFORM varying the primary parameters and holding 
the others fixed at their central values. The effect of the 
fixed variables is accounted for through a contribution to 
the model discrepancy term (see §3.6). (Note that in wave 
4, as the region of parameter space becomes more restricted, 
we will allow the full set of parameters to vary.) For each of 
the 11 outputs, the set of 8 parameters was initially reduced 
by backwards stepwise elimination, starting with a model 
containing the 8 linear terms in x. Then individual inputs 
were discarded in turn based upon the significance of their 
main (i.e. linear) effect. Before an input would be discarded, 
a full third order polynomial was fitted to see the extent of 
variance explained with the current set of active variables. It 
was found that 5 active variables could explain satisfactory 
amounts of the variance of /i(x) for each output i, based on 
the adjusted Ft?' of the polynomial fits [E? is the Coefficient 
of Determination of the fit and takes values between and 1, 
with higher values implying that the fit explains more of the 
model output's behaviour). Note that the subset of 5 active 
variables is in general different for each output variable, as 
shown in tables [2] and |3] Including more than 5 variables 
would yield little extra benefit at this stage, while using 
fewer than 5 leads to a significantly worse description of the 
main trends. 

Once the set of active variables has been determined, 
the full set of regression terms (the jSij gij{x.A)) can be cho- 
sen. This was done by starting with the full 3rd order polyno- 
mial in the 5 active variables and using backwards stepwise 
elimination to remove less significant terms from the model. 
Note that a large number of model evaluations are required 
to enable the fitting of a third order polynomial, although 
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Table 3. The 8 candidate active variables for the Wave 1 K- 
band luminosity function output (see Fig. with the final 5 
active variables for each K-band output marked by an x. Note 
the low adjusted of output 3: this was not used in the first 
wave analysis, but did feature in later waves. 

this greatly depends on the number of active variables. Now 
that the final regression terms have been chosen for each 
output /i(x), estimates for the set of {/3ij} coefficients can 
be obtained using Ordinary Least Squares, assuming uncor- 
related errors (a reasonable assumption as we have a large 
number of runs sufficiently far apart in the input space that 
the residuals should not be strongly correlated). Note that it 
is important to check that the structure of the polynomials 
obtained from this process agree with physical intuition. 

For the two contributions to the residual process Mi(xA) 
and Wi(x) we specify a correlation structure as follows. As 
the u,;(xa) represent local deviations from the regression 
surface we assume that there will be a large correlation be- 
tween Ui at neighbouring values of the active inputs x^, and 
specify the following Gaussian covariance structure: 

Cov(ui(xA),iti(x^)) = o-^. exp(-||xA - x;4||^/6'f), (13) 

where cr^. is the point variance at any given xa, Oi is the 
correlation length parameter that controls the strength of 
correlation between two separated points in the input space 
(for points a distance 9 apart, the correlation will be exactly 
exp(— 1)), and || • || is the Euclidean norm. As the nugget 
process ?iii(x) represents all the remaining variation in the 
inactive variables, it is often small and we treat it as uncor- 
related random noise with Var(K;i(x)) = ct^. . We consider 
the point variances of these two processes to be proportions 
of the overall residual variance: (which is obtained from 
the OLS regression fit), and write that crj;. = (1 — 5i)ai and 
(j^. = Sia^ for some usually small Si. 

Various techniques for estimating the correlation length 
and nugget parameters 9i and Si from the data are available 
(eg. variograms, REML, maximum likelihood: Cressie 1991), 
however an alternative is to specify them from previous ex- 
perience of computer models (Craig et al. 2001, Kennedy & 
O'Hagan 2001) which is the approach we adopt here. Specif- 
ically, in Wave 1 we choose Oi = 0.35 and S — 0.2, remember- 
ing that the inputs have all been scaled so that their range 
is [—1, 1]. The choice for theta is motivated by the fact that 
we are fitting third order polynomials to the model output, 
and therefore the residuals from this fit, which are mod- 
elled by the stationary process, will behave like a fourth (or 
higher) order polynomial. This suggests a correlation length 
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of 0.35 would be reasonable. The value for 5 is assessed by 
examining the variance of the model output explained by the 
inactive variables by fitting various polynomials using only 
the inactive variables. It should be noted that the choices 
for 9 and S are more conservative than values obtained us- 
ing alternative estimation techniques, and that this was a 
deliberately cautious choice. More details of the motivation 
of these parameter choices, and description of the diagnostic 
used to confirm the accuracy of this approach are given in 
Vernon et al. 2010. 

The next step is to update the process /i(x) at a new 
point X, with the information contained in the 1000 runs 
of the model. We do this using the Bayes Linear update 
formula discussed in the next section. 



3.5.4 Bayes Linear Approach. 

For large scale problems involving computer models such 
as GALFORM, a full Bayes analysis (involving probability 
distributions for all random quantities) is difficult for the fol- 
lowing reasons. Firstly, it is very difficult to give a meaning- 
ful full prior probability specification over high-dimensional 
input spaces. Secondly, the computations for learning from 
both observed data and runs of the model, and choosing in- 
formative runs, may be technically very challenging. Thirdly, 
in such computer model problems, often the likelihood sur- 
face is extremely complicated, and therefore any full Bayes 
calculation may be extremely non-robust. However, the ba- 
sic idea of capturing our expert prior judgments in stochastic 
form and modifying them by appropriate rules given obser- 
vations, is conceptually appropriate. 

The Bayes Linear approach is (relatively) simple in 
terms of belief specification and analysis, as it is based only 
on the mean, variance and covariance specification which, 
following de Finetti (1974), we take as primitive. Therefore, 
a Bayes Linear approach proceeds by the specification and 
modification of mean and variance structures only. 

We replace Bayes Theorem (which deals with full prob- 
ability distributions) by the Bayes Linear adjustment which 
is the appropriate updating rule for expectations and vari- 
ances. The Bayes Linear adjustment of the mean and the 
variance of a random quantity B given data D is: 

Ed[B] = E(B) + Cov(B,D)Var(_D)~'(_D-E(D))(14) 

(15) 

YhtdIB] = Var(B) - Cov(B, D)Var(D)"^Cov(D, Bp6) 

(17) 
(18) 

Ed[B], Vari5[_B] are the expectation an d va riance for B ad- 
justed by knowledge of In equations ( 15 1 and ( 17 1, B and 
D can represent scalars or vectors of uncertain quantities. In 



the later case ( 15 1 and (1171) become matrix equations where 



if S is a vector of length ub and D is a vector of length no, 
Cov{B, D) is a matrix of dimension x no and Var(D) a 
matrix of dimension no x no. 



^ Ej3 [B] and Varjj [B] are the corresponding Bayes Linear quan- 
tities to E(_B|Z)) and Var(B|D), the conditional expectation and 
variance of B given data D that would be extracted from a fully 
Bayesian analysis. 



The Bayes linear adjustment may be viewed as an ap- 
proximation to a full Bayes analysis, or more fundamentally 
as the "appropriate" analysis given a partial prior specifica- 
tion based on expectation. For more details see Goldstein & 
Wooff (2007). 



3.5.5 Updating the Emulator. 



Equations ( 15 I and ( 17 1 give the rule for updating the emu- 



lator with knowledge of the 1000 model evaluations, where 
the random quantities B and D will represent an unknown 
output and the collection of 1000 known outputs respec- 
tively. We proceed with the update as follows. As we have 
a relatively large number of runs, we first assume that the 
regression coefficients fiij in the emulator equation ( 12 I are 



known and hence have zero variance. As Ui{xA) and Wi(x) 



both have zero expectation, equation ( 12 \ gives the expec- 



tation of model output i at input x to be: 

E(/»(x)) = ^/?,jgij(xA) 

and the variance of /i(x) to be: 
Var(/i(x)) = Var(ui(xA)) + Var{wi{x)) = r 



(19) 



(20) 



As the Mi(xA) and the Wi(x) terms are uncorrelated, the 
covariance between output i at two different inputs x' and 



X can now be written (using equations ( 12 I and ( 13 1): 
c(x',x) = Cov(/.(x'),/,(x)) 

= Cov(Mi(xA), Mi(xa)) + Cov(uii(x'), Uli(x)) 

= o-u, exp(-|lx^ - xaIIV^?) + '^^i'^x'x (21) 
where S^i^ is a Kronecker delta, equal to 1 when x' = x 



and zero otherwise. The second term in equation (21 1 comes 
from the nugget Wi{x.) which gives a zero contribution except 
when x' — x. 

We can now define the following quantities correspond- 
ing to the n — 1000 model evaluations. We write the 
locations of the n runs in input space as Xj with j — 
l,..,n where each Xj represents the vector of inputs for 
the jth run. Similarly xaj is defined to be the vector of 
Active Variable inputs for the jth run. We define Di = 
(/i(xi), /i(x2), /i(x„))"^, that is the column vector of the 
n evaluation outputs for output i, the prior expectation of 
which (E(_Di)) can be found using equation ( |19[ ). 

Replacing the random quantity B in the Bayes Linear 
update equation (15 1 with the unknown output /i(x) at in- 
put x gives the adjusted expectation Eq. [/^(x)] to be: 

Ei5,[/.(x)] = E(/.(x))+Cov(/,(x), A)Var(A)-'(A-E(A)), 



which becomes (using equation (19l) 



EdJ/,(x)] = ^/3ygy(xA)+t(x)^A~'(A:-E(A)), (22) 

j 

where now t(x) — (c(x, xi), c(x, X2), c(x, x„))^ — 
Cov(/i(x), Di) is the column vector of covariances between 
the new and known points, and A is the matrix of covari- 
ances between known points: an n x n matrix with elements 
Ajk = c(xj,Xfc). The Adjusted Variance Var_D. [_B] can simi- 
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larly be found from equation (171 and (20 1 giving: 



VarB.[/,(x)] = 

= Var(/,(x)) - Cov(/.(x), A)Var(A)"'Cov(A,/»(x)), 



: (7? -t(x)^A~4(x). 



(23) 



The Adjusted Expectations and Variances, E_Dj/i(a^)] 



and Var_D; [B] , given by equations ( 22 1 and ( 23 1 form the 
basic ingredients in the construction of the Implausibihty 
measure used to reduce the input space to a much smaUer 
"non-implausible" volume. Note that if we had chosen a sim- 
ple emulator of the form fi (x) = Ui (x) , such as was used in 
the ID example in j |3.5.1[ then equations ( |22[ | and ( |23[ ) would 
reproduce exactly equations (|9| and (10 1. 



say), will become smaller. In principle, this process of 
model improvement can be built into our statistical emula- 
tion so that we use our knowledge of previous versions of the 
GALFORM code both to speed up parameter exploration 
in newer versions, and to obtain more realistic representa- 
tions of the Structural Uncertainty (Goldstein & Rougier 
2009); however, we have not explored this possibility here. 

As we are employing a Bayes Linear approach we only 
need to specify expectations and variances for Emd'- we give 
a summary of this process here, the full details of which can 
be found in (Vernon et al. 2010). 

We decompose emd into three uncorrelated contribu- 
tions, each of which are 11-vectors that are assumed to have 
zero expectation: 



3.6 Linking the Model to the System: Structural 
Uncertainty and Model Discrepancy 

In order to declare regions of the input space as "implausi- 
ble", and to then exclude them from the analysis, we need 
to formally link the GALFORM model f (x) to the observed 
luminosity function data which we represent as the 11 com- 
ponent vector z. We do this by linking both f(x) and z to 
the actual system (in this case the real Universe) represented 
by y, taking into consideration the Structural Uncertainty. 
In a rigorous Bayesian approach, this step is key in order 
to justify any further uncertainty statements; it is, however, 
a relatively unfamiliar process to many scientists. We em- 
ploy a description that is widely used in computer modeling 
studies (eg., Craig 1996; Kennedy & O'Hagan 2001; Gold- 
stein & Rougier 2009). This involves the notion that when 
we evaluate GALFORM at the actual system properties, x* 
say, then we aim to reproduce the actual system behaviour 
y. This does not mean that we would expect perfect agree- 
ment between f (x*) and y. Although GALFORM is a highly 
sophisticated simulator, it still offers a necessarily simplifed 
account of the evolution of galaxies, and involves various 
numerical approximations. The simplest way to view the 
difference between f* = f(x*) and y is to express this as: 

y = f * + Cmd, 

where we consider the 11-vector e,nd as a random variable 
uncorrelated with f*. The "Model Discrepancy" term e^nd 
represents the Structural Uncertainty. It comes from our 
judgments regarding the accuracy of the model and deter- 
mines how close a fit between model output, f*, and an 
observation of y we require for an acceptable level of consis- 
tency between theory and observation. 

As GALFORM is an approximation to the physical pro- 
cesses that occur during galaxy formation in the real Uni- 
verse, we must acknowledge and attempt to quantify the 
level of this approximation, represented by Emd, in order for 
further analysis to be meaningful. While this is a difficult 
task, ignoring the model discrepancy will lead to all future 
statements being conditional on the current version of GAL- 
FORM being a perfect model of the Universe. Since we know 
that this is not the case, it is essential that we build in some 
degree of fuzziness into the comparison between the model 
and data. Failure to do so may result in us prematurely re- 
jecting regions of parameter space in which a solution of 
interest resides. As the level of approximation in the model 
is reduced (by considering an improved version of the model 



emd = *IA + + 

Here ^ia represents the discrepancy due to the eight inac- 
tive parameters that we did not model in detail in the initial 
waves of the analysis (that is, the parameters that do not 



feature in Table 3.5.31. We assessed Var($/A) from a small 
set of runs over the 8-dimensional inactive parameter space 
and found for i = 1, .., 11 that 0.015^ < Var($7A,0 < 0.32^ 
In later waves, we performed runs across all 16 inputs and 
hence the ^ia term was then set to zero as this model dis- 
crepancy was now absorbed directly into the emulator. 

^ DAI is the discrepancy due to the finite number (40) of 
sub-volumes used for the model runs: Var($DM) was found 
by analyzing the sample variance of the 1000 Wave 1 runs 
across sub- volumes, and it was estimated for i = 1,...,11 
that 0.014^ < Var($DM,i) < 0.022^ For a sub-set of 100 
runs, we confirmed this estimate by drawing a random set 
of 40 sub- volumes. 

summarises the structural deficiencies of the full 
GALFORM model itself, and is the component derived from 
subjective judgments regarding model accuracy. We pro- 
ceeded by generating a random test set of luminosity func- 
tions by perturbing the observational data and smoothly in- 
terpolating. These were then compared to the observational 
data using an interactive tool which asked the expert user 
to judge whether to reject the corresponding region of pa- 
rameter space on the basis of the comparison and previous 
experience of improvements to the GALFORM code and 
changes in cosmological models. Summarising these tests, 
we concluded that a credible interval around the observa- 
tional data, within which runs would be deemed acceptable 
would be approximately a factor of 2 wide in terms of galaxy 
counts, and hence ±| logiQ(2) on the log scale used through- 
out this paper (see for example Fig. |2|. Relating this to a 
±2(T interval (a conservative choice) this leads to the vari- 
ance of each of the 11 components of $_b being assigned 
values: Var($B,») = (logi(,(2)/4)2 = 0.0753^ for i = 1, .., 11. 
Reassuringly, this results in the Bow06 model being close 
to the boundary of acceptable solutions. Examples of the 
range of fits that are deemed acceptable are illustrated in 
Fig. [s] The expectation was again set to E{^E,i) ~ 0, as 
it was thought that there were no significant asymmetries 
concerning this component of the Model Discrepancy. It is 
important to realise that while this assessment for is 
necessarily subjective, it was also chosen to be deliberately 
conservative. Once the volume of acceptable inputs has been 
identified corresponding to all uncertainties discussed in this 
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section, it is then possible to explore the effect of reducing 
the size of Var($B). 

Finally, we must make allowance for the uncertainty in 
observational measurements. Since, we cannot observe the 
system y (i.e. the actual Universe) without measurement 
error, we link it to the observations z by: 

Z = y + Eobs, 

where eobs is again a random quantity that represents the 
observational errors. It has expectation zero, and variance 
composed of contributions from the luminosity calibration 
uncertainty, the normalisation uncertainty, k + e errors and 
Poisson errors (see Norberg et al. 2002 for a discussion of 
how these terms are estimated). Fig. [2] shows the 2-sigma 
error bars formed from the combination of all components 
of Var(em(i) and Var(eoi,s). It should be noted that in most 
cases the Model Discrepancy terms dominate over the obser- 
vational errors. (Fig. |5] shows the same error bars minus the 

component which is no longer relevant, as by Wave 4 we 
have modelled the effect of the remaining inactive variables 
within the emulator directly.) 

With this structure linking f(x), y and z in place we 
can now proceed to learn about acceptable values of x. 

3.7 Implausibility Measures 

We want to learn about which values of the input parame- 
ters X are likely to give an acceptable match between model 
output and observational data. We do this through use of 
an Implausibility Measure /(x) defined over the input space. 
The Implausibility Measure describes the magnitude of the 
difference between the expected value of the GALFORM 
outputs and the observational data, standardized with re- 
spect to all relevant uncertainties. The basic idea is that for 
a particular value of x, if /(x) is large then we can discard 
this value of x as it is highly unlikely to yield a good match 
between model output and the observational data. 

Using the emulator, the model discrepancy and the 
measurement errors we define the Univariate Implausibility 
Measure, at any input parameter point x, for each compo- 
nent i of the computer model f (x) as: 

7f,)(x) = |Eo,(/,(x)) - z,|VVarz,,(Eo,(/.(x)) - z,) (24) 

where Ejj. (/^(x)) and Varu. (/i(x)) are the emulator expec- 
tation and variance adjusted by Di and Zi is the observed 
data for component i. Introducing the model discrepancy 
and observational error terms, this can be re-written as: 

r2 ^ |Ed.(/.(x))-z,P 

^ (VarB,(,f,(x)) + Var(e™d,0 + Var(eoi,..0) 

where Var(em£i,i) and Var(eo6s,0 are the (univariate) Model 
Discrepancy variance and Observational Error variance. 

When /(i)(x) is large this implies that, even given all 
the uncertainties present in the problem, we would be un- 
likely to obtain a good match between model output and 
observed data were we to run the model at input x. This 
means that we can cut down the input space by imposing 
suitable cutoffs on the implausibility function (a process re- 
ferred to as History Matching). Regarding the size of /(i)(x), 
if we assume that for fixed x the appropriate distribution of 
(/i(x*) ~ Zi) is both unimodal and continuous, then we can 
use the 3a rule which implies that if x = x* , then (x) < 3 



with a probability of approximately 0.95. This is a powerful 
result that applies to any distribution that is unimodal and 
continuous, even if it is asymmetric. It suggests that val- 
ues higher than 3 would imply that the point x should be 
discarded. This is still a very conservative bound: we would 
expect the distribution of (/i(x*) — Zi) to be somewhat bet- 
ter behaved and hence choose slightly tighter bounds, as 
discussed in § |3.8[ 

It should be noted that since the implausibility relies 
purely on means and variances (and therefore can be evalu- 
ated using Bayes Linear methodology), it is both tractable 
to calculate and simple to use to reduce the input space. 

One way to combine these univariate implausibilities is 
by maximizing over outputs: 

hi (x) = max 7(i) (x) 

i 

We can similarly define 72A/(x) and 73m(x) to be the sec- 
ond and third highest of the 11 univariate implausibility 
measures at the point x. These are clearly more conserva- 
tive measures since a model will not be deemed implausible 
on the basis of a single bin. 

If we construct both a multivariate emulator and mul- 
tivariate model discrepancy (as is described in detail in Ver- 
non et al. 2010), then we can define the corresponding mul- 
tivariate Implausibility measure: 

llM^) = (ED(f(x))-z)^Varo(ED(f(x))-z)-^(EB(f(x))-z), 
which becomes: 

/Ifv(x) = (ED(f(x))-z)^ X (26) 
(Varo(f(x)) + Var(e™d) + Var(eo6,))-'(Ei3(f(x)) - z).(27) 

where f(x) is the full 11-vector model output and 
Var_D(f(x)), Var(emd) and Var(eoi,s) are all 11 x 11 covari- 
ance matrices. Again, large values of /A/y(x) imply that we 
would be unlikely to obtain a good match between model 
output and observed data were we run the model at input 
X. Choosing a cutoff for Imv{x) is more complicated. As a 
simple heuristic, we might choose to compare JMy(x) with 
the upper critical value of a distribution with degrees of 
freedom equal to the number of outputs. For further discus- 
sion of implausibility measures, see Vernon et al. 2010. 



3.8 History Matching via Implausibility 

History Matching is the process of identifying the set X* of 
all possible values of x* , that is the set of points that would 
give acceptable matches between model output and obser- 
vational data. Identifying X* is a difficult task as often X* 
represents a complicated object in a high-dimensional space. 
X* could also comprise disconnected volumes, which could 
even possess non-trivial topology. In many applications X* 
occupies an extremely small fraction of the original input 
space, with large volumes of input space leading to very 
poor matches to the observed data. 

We employ an iterative technique where the Implausi- 
bility Measures are used to perform the History Matching 
process. The basic strategy is based around discarding val- 
ues of X that are highly unlikely to yield acceptable matches 
between model output and observational data. This is done 
by applying a cutoff on the Implausibility Measures defined 
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in § |3.7[ As the Implausibility Measures are constructed us- 
ing the emulator, they are fast to evaluate and therefore we 
can efficiently identify values of x that will be discarded, for 
example, in Wave 1 we discard all values of x that do not 
satisfy both: 



/2m(x) < hcut and /3a/(x) < I3, 



(28) 



where /2a/(x) and /3m(x) are the second and third highest 
univariate implausibility measures defined in § |3.7| and l2cut 
and Iscut are the corresponding implausibility cutoffs. Ta- 
ble |4] shows all the implausibility measures used in each of 
the waves along with the corresponding cutoffs. Note that 
in early waves we make the conservative choice of using only 
/2m(x) and /3m(x) (and not 7m (x)), so that the cutoff we 
impose is not sensitive to the possible failings of an individ- 
ual emulator point on the luminosity function. This allows 
slightly tighter cuts to be chosen for /2cut and Jscut as is 
shown in table H) 

Equation ( |28[ l defines a volume of input space that 
we refer to as non-implausible and denote Xi. This non- 
implausible volume should hopefully contain the set X* , 
that is X* C A*!. In the first wave of the analysis which we 
are describing here, Xi will be substantially larger than X* . 
This is because it will contain many values of x that only sat- 



isfy the implausibility cutoff given by equation ( 28 \ because 



of a substantial emulator variance Var(f (x)). If the emulator 
had a high degree of accuracy over the whole of the input 
space so that Var(f(x)) was small compared to the Model 
Discrepancy and the Observational Error variances, then the 
non-implausible volume defined by X\ would be comparable 
to X* and the History Match would be complete. However, 
to construct such an accurate emulator for any realistic com- 
puter model (and especially for GALFORM) would require 
an infeasible number of runs of the model. Even if such a 
large number of runs was possible it would be an extremely 
inefficient method: we do not need the emulator to be highly 
accurate in regions of the input space where the outputs of 
the model are clearly very different from the observed data. 

This is the main motivation for our iterative approach. 
In each wave we design a set of runs over the current non- 
implausible volume denoted Xi, emulate using these runs, 
calculate the implausibility measure and impose a cutoff 
to define a new (smaller) non-implausible volume denoted 
Xi+i which should satisfy X* C Xi+i C Xi. As we progress 
through each iteration the emulator at each wave will be- 
come more and more accurate, but will only be defined over 
the previous non-implausible volume defined by the previous 
wave's implausibility. 

As we proceed through waves of emulation the vol- 
ume being emulated decreases, and the GALFORM function 
should become smoother over the restricted range of inter- 
est. As a result, it becomes easier to capture more of the 
behaviour using the regression terms in the emulator (ie., 
by fitting a cubic polynomial to the GALFORM output). 
Furthermore, the density of runs that inform us about the 
models behaviour increases and the Gaussian Process part 
of the emulator becomes more accurate. As the output of the 
runs have been restricted and the efi'ects of certain dominant 
variables limited, it becomes easier to identify additional ac- 
tive variables which are then used in both the regression and 
the Gaussian Process terms, further increasing the accuracy 
of the emulation. 



This iterative process is continued until the emulator 
variance is smaller than the model discrepancy variance and 
observational error variance. We have completed 4 iterations 
or Waves in this analysis, and the subsequent results in this 
paper are from Wave 4. 

3.9 Projection Pursuit 

The end result of the emulator analysis is to identify a region 
of parameter space in which models produce an acceptable 
fit to the bj and isT-band luminosity functions. However, 
comprehending the resulting space is rather challenging. As 
we will show, while the parameter space of acceptable model 
occupies only a small fraction of the overall parameter space, 
acceptable fits can be found over a wide range of input pa- 
rameters. This situation arises because the acceptable space 
takes the form of a thin curved hyper-surface. 

The aim of projection pursuit is to select a suitable 
co-ordinate system that allows the geometry of the accept- 
able region to be better understood. We achieve this using 
principal component analysis (PGA, eg. JoUiffe 2002, Zito 
et al 2009). However, in contrast to many applications of 
PCA, we are primarily concerned with the components with 
smallest variance. These components define an optimal set 
of projections for displaying the data, and the relation be- 
tween the PCA vectors and the input parameters. The latter 
connection has the potential to inform us about the physics 
of galaxy formation. 



3.10 Exploring Constraints from Additional 
Datasets 

We have adopted a strategy in which the primary calibra- 
tion of our model comes from the local bj and K band lu- 
minosity functions. Nevertheless, we wish briefly to explore 
whether adding additional data sets would impose further 
constraints on the range of acceptable model parameters. 
In this paper, we do not aim to make an exhaustive ex- 
ploration of the possible data sets and limit our attention 
to just a small fraction of the possible local data. We use 
a simple statistic to assess the relative performance of 
models in these additional tests and ask about the region 
of parameter space which matches the additional data at a 
similar level of performance to Bow06 (as well as adequately 
matching the observed luminosity functions). As we show in 
§5.2, the model experiences contradictory pressures from the 
observed disk sizes and the normalisation of the TuUy- Fisher 
relation, possibly indicating that a revised treatment of an- 
gular momentum is required in the Bow06 version of the 
GALFORM code. This clearly illustrates the need to care- 
fully define the model discrepancy terms for these additional 
data sets before they are used to exclude regions of parame- 
ter space. We present further exploration of additional data 
sets in a future paper (Benson & Bower, 2009). 



4 RESULTS 

The results described below were obtained with 4 waves of 
emulation. The implausibility cut-off threshold for each wave 
is shown in Table[4] together with the fraction of the param- 
eter space considered acceptable after the emulator has been 
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Wave 


Runs 


#Act. 


^cut 




^3cut 


^MVcut 


1 


1000 


5 




2.7 


2.3 




2 


1400 


8 




2.7 


2.3 




3 


1600 


8 




2.7 


2.3 


26.75 


4 


2000 


10 


3.2 


2.7 


2.3 


26.75 


5 


2000 




2.5 









% Space 



Table 4. The fraction of parameter space considered acceptable 
in each wave of emulation. Column 1: the wave; Column 2, the 
number of model runs used to construct the emulator; Column 
3, the number of active variables; Column 4-7, the implausibility 
cut-off threshold; Column 8, the fraction of the parameter space 
estimated to be acceptable. Note that in Wave 5 we do not con- 
struct an emulator, but we impose a cutoff of Ij^.j < 2.5 on the 
Wave 5 runs to generate the 113 acceptable runs used in ijs] 





Min 


Max 


-1 


46 


1000 


ce* 


-3.2 


-0.3 


Pyiold 


0.02 


0.05 


Vhot.disk 


300 


550 


^^ot, burst 


190 


550 


Qhot 


2.3 


3.7 


^reheat 


0.2 


1.2 


'^cool 


0.38 


1.2 


/df 


0.8 


2.7 


/stab 


0.73 


0.95 



Table 5. This table shows the parameter ranges of models which 
gave acceptable luminosity function matches. The inactive vari- 
ables range over their full range given in Table ^ 



14.9 % 
5.9 % 
1.6 % 
0.26 % 
(0.014%) 



constructed. The table also gives the number of runs used 
during each wave. 

After 4 waves of emulation, the uncertainties in the em- 
ulator are small and the implausibility of each run is be- 
coming dominated by the intrinsic model discrepancy emd- 
Uncertainties in the observational measurement of the lumi- 
nosity function make almost negligible contribution, and the 
dominant contribution to the model uncertainty comes from 



the model discrepancy term, (see discussion in S3.6I. 

At this point, the emulator suggests that only 0.26% 
of the initial parameter volume is "not implausible" . While 
this volume is small, we will show below that an "accept- 
able" fit can be obtained with a wide range of values for some 
parameters. Note that we are being careful in our use of lan- 
guage here. We do not know for certain what the outcome 
of running the model will be at a particular set of parameter 
values, except close to values at which we have already per- 
formed a model run. We do, however, have a prediction for 
the expectation and variance. Fig. [6] shows a comparison of 
the expectation of the emulator and its uncertainty with the 
results of actual model runs and we discuss this comparison 
further in §4.2| However, as we should expect, only a frac- 
tion of the runs within the "not implausible" region actually 
result in sufficiently good fits to the luminosity function to 
be considered "acceptable". There are two factors involved 
here. Firstly, we are tightening the required implausibility 
from 3.2 to 2.5. In 16 dimensions, this results in a large re- 
duction in the surface of the interesting region. Secondly, 
for many runs the expectation of the emulator is that the 
model implausibility lies above 2.5, but the residual emu- 
lator variance cannot rule out the region as unacceptable 
without direct evaluation. This uncertainty arises from em- 
ulator variance, and not from observational error or model 
discrepancy terms. 



4.1 Emulating the Luminosity function 

A serious problem for such high-dimensional parameter sets 
is to find a way of representing the implausibility map. Pro- 
jecting the full 10 dimensional map (of active variables) 
down to 2 dimensions so that it can be printed loses con- 
siderable information. We can try to compensate for this 
by showing the full set of projections as a matrix (this is 
commonly referred to as a "pairs plot" ) . Fig. |4] shows the 
implausibility space projected onto pairs of parameters in 



this way. Note that only active variables are shown so that 
there are 45 plots of 10 variable pairs. The parameters have 
been scaled to range over ±1 using the initial range given in 
Table El 

Plots below the diagonal show the projected minimum 
implausibility surface. The colour code is set so that green 
indicates that the region is "not implausible" . The implausi- 
ble region is shown in red. The minimum implausibility is de- 
termined by evaluating the emulator over a grid of values for 
the two "visible" parameters and a Latin hypercube of pa- 
rameters in the unseen variables. Because the hyper-volume 
of acceptable solutions is very thin in some projections, a 
large number of evaluations are required in order to obtain 
a reliable projection of the minimum value. Even though 
each evaluation is almost 10^ times quicker than performing 
a GALFORM model evaluation, this means that such plots 
cannot be made interactively. 

It is immediately apparent that many of the projec- 
tions contain a substantial fraction of green covering a large 
fraction of the parameter space. For example, the Ocooi pa- 
rameter was allowed to vary in the range 0.1 to 1.2, and it is 
not implausible to find acceptable fits through-out this re- 
gion. This is the result of projecting over a large number of 
hidden parameters, however, and it is apparent that varying 
the visible parameters had physical effects that can be com- 
pensated for by variations in the other parameters. In order 
to better appreciate the underlying geometry, therefore, it 
is helpful to plot the "optical depth" of the projected hyper- 
volume. Therefore above the diagonal, we show the fraction 
of the evaluations (for each pair of fixed visible parame- 
ters) that resulted in low implausibility values. This allows 
the viewer to distinguish regions that have a low minimum 
implausibility but require very precise coordination of the 
unseen parameters, from regions in which a low implausibil- 
ity is obtained for a wide range of the unseen parameters. 
One has the intuitive sense that the "best" solution lies in 
a region that has a large depth, but this assertion does not 
necessarily hold. Thus the appearance of regions in this op- 
tical depth plot needs careful interpretation. For reference, 
the Bow06 model is shown by a black point. It is often cen- 
tered in a "deep" region of low implausibility, but in some 
projections it is offset from the depth weighted centre of 
the region. It is apparent from this that the Bow06 is not 
a "typical" model that matches the zero redshift luminosity 
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Figure 4. 2-D Projections of the implausibility landscape of the Wave 4 emulator for some of the major parameters. Plots below the 
diagonal show the projected minimum implausibility predicted by the Wave 4 emulator. We sample data points in the hidden dimensions 
using a latin hypercube design, accumulating the minimum implausibility and the fraction of points lying below the implausibility cut-off 
(see Tabic. [4]l. The emulator suggests that green regions are likely to give acceptable fits to the luminosity function data, for some choice 
of the hidden parameters (given our model discrepancy). In the red region the emulator confidently suggests that acceptable matches 
are implausible, regardless of the values of the hidden parameters. For comparison, the Bow06 model is shown as a black point. Plots 
above the diagonal give an impression of the line-of-sight depth of the acceptable region with blue/purple regions indicating that a high 
fraction of the "not implausible" points are aligned at this projected position. Note that plots above the diagonal have x and y axes 
transposed to make comparison with the maximum implausibility plots more apparent. Although it seems that only a small region of 
the projected space is ruled out as implausible, the projected space frequently has a very thin, but extended, geometry. 



function: illustrating the limitations of searching parameter 
space to find a single acceptable model. 

Higher dimensional projections can also be used to re- 
veal more of the underlying structure. The typical hyper- 
surface geometry is that of a thin, slightly curved plane. For 
a few variables, it is nevertheless informative to look at the 
range of the "not implausible" region in a 1 dimensional 
sense, and this is given in Table |5] While this region covers 
a large fraction of the initial range for many parameters, a 
few of the parameters are significantly constrained. For ex- 
ample, the emulator shows that it is implausible that runs 



with low values of Vhot.disk (below 300kms~^) will result 
in acceptable fits to the luminosity functions. Similarly, the 
emulator suggests that the disk stability parameter cannot 
be reduced below 0.73, showing that disk instabilities are a 
key component of the model and that the role of instabilities 
cannot be replaced by altering the sensitivity of galaxies to 
high mass ratio mergers (Parry et al. 2009). 

The parameters that are shown as inactive in Table [l] 
have no clear effect on the luminosity function and are 
treated as an additional source of uncertainty in the emula- 
tor. Runs close to the implausibility cut-off may result in ac- 
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Figure 5. The bj and K luminosity functions of the model runs 
evaluated in Wave 5. Runs are colour coded by the model run 
implausibility. For comparison, the Bow06 model is shown as a 
blue line. Note that 2a error bars are shown. The inner error bars 
include the contributions of observational errors and repeatabil- 
ity, but exclude the subjective #£; term. Outer error bars also 
include the full model discrepancy term. Green curves are con- 
sidered acceptable fits and are tested against other observational 
constraints in §4.3| Vertical lines show the magnitudes at which 
the model is compared to the observational data in order to de- 
termine its implausibility. 



ceptable fits to the luminosity functions if these parameters 
are carefully chosen. However, our procedure is conservative 
and this is taken into account in deciding whether a region 
is implausible or not. However, while the emulator can iden- 
tify regions for which an acceptable fit is implausible, it does 
not guarantee that a run in the remaining region will actu- 
ally result in a good fit to the luminosity function. This is a 
consequence of our conservative approach: the "not implau- 
sible" region has a cut-off threshold of Im < 3.2, while we 
only deem a model to be "acceptable" if Im < 2.5. There- 
fore, in order to demonstrate that an acceptable match to 
the luminosity function can be obtained, we must perform 
a model run at the point in question. 



4.2 Comparing the Emulator with Model Runs 

In the previous section, we illustrated the shape of the low 
implausibility region in parameter space. It is important to 
stress that this is a region in which good fits might be ob- 
tained, but that an acceptable match is not guaranteed. To 



proceed, we investigate the success of the emulator tech- 
nique by randomly generating 2000 model runs (which we 
refer to as "Wave 5") with various parameter values within 
the not implausible region of Wave 4. Our aim is twofold. 
Firstly, we wish to show that the model runs do indeed re- 
turn matches to the luminosity function within the expected 
uncertainties, and that good descriptions of the luminosity 
function are indeed found throughout the range of parame- 
ter space illustrated in Fig. [4] Secondly, we wish to identify 
a set of model runs that give acceptable matches to the ob- 
served luminosity functions, and that can be used as the 
basis for our exploration of the constraints imposed by ad- 
ditional datasets. 

Since we are now comparing genuine model evaluations 
with the data, the emulation stage is no longer required and 

Varo;(/i(x)) in eq. 
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the emulator variance term, 
placed by a small factor based on the stochastic variation 
in model evaluations. We denote this revised implausibility 
measure I'm ■ The denominator in I'm thus includes only con- 
tributions from the model discrepancy, observational errors 
and the repeatability of model runs. Because we are now re- 
ferring to actual model evaluations, we can refer to a model 
as "acceptable" rather than "not implausible". We identify 
"acceptable" models as those for which I'm < 2.5. It is im- 
portant to note that we distinguish acceptable models by 
a lower implausibility cut than the threshold used to reject 
implausible regions (see Table |4|. 

The luminosity functions of the Wave-5 model runs are 
shown in Fig. [5] The lines have been colour code to refiect 
the implausibility derived for the run using the colour scale 
of Fig. |4] Note that the 2a error bars shown on the observa- 
tional points include the effect of the model discrepancy and 
repeatability. Around the knee of the luminosity function the 
statistical errors on the luminosity function are small and 
are dominated by the model discrepancy term. Vertical lines 
show the points at which we have emulated the luminosity 
function. An acceptable luminosity function must pass close 
to the error bars at the lines. As can be seen, this generally 
provides a good description of the shape of the luminosity 
function and justifies our assertion that the shape is well 
described by the 11 outputs chosen for emulation. In com- 
mon with previous version of the code, acceptable models 
tend to lie above the lowest luminosity measurements. The 
distinction between green ("acceptable") and yellow ("not 
quite acceptable" ) models is that the later tend to miss one 
or more data points. It is clear, however, that the points 
at the bright end of the luminosity function, particularly in 
the &j-band, present the greatest challenge for the galaxy 
formation model. 

We can first use the model evaluations to confirm the 
success of the emulator in describing the luminosity function 
behaviour. In Fig. [6j we compare the values of the luminos- 
ity function in different absolute magnitude bins from direct 
evaluations of the model with predictions from the Wave 4 
emulator. The panels are labeled by the absolute magnitude 
in either the bj or K-band at which the luminosity function 
is sampled. The runs are shown in order of their emulator 
expectation, which is drawn as a think black line. Note that 
adjacent points are not adjacent in parameter space. The re- 
sult of evaluating the model for each parameter set is shown 
as a blue circle. If the model evaluation agreed exactly with 
the expectation value, the blue points would lie perfectly 
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Figure 6. Values of the bj and K-band luminosity functions for direct model evaluations compared with the expectation and variance 
predicted by the emulator. The figure shows 200 runs selected at random from Wave 5. The panels are labeled with the relevant bj or 
K magnitude at which the luminosity function is sampled: these correspond to the vertical lines in Fig. |5] The x-axis shows the run 
number ordered by the value of the luminosity function predicted by the emulator The y-axis shows the value of the luminosity function. 
The solid black line shows the expectation value of the emulator: because the runs have been ordered, the black line traces a smooth 
curve (but adjacent points need not be close in parameter space). The red shaded regions indicates the 3.2a range of uncertainty in the 
emulator output. The open circles show the values of model runs with each parameter set. Note that some regions of the luminosity 
function are easily emulated, while other regions, particularly the bright end of the bj luminosity function, show significantly greater 
variance. 



on the line. However, we expect the points to scatter about 
the line as a result of the uncertainties in the emulation and 
the Monte-Carlo nature of the model. The extreme of the 
predicted range of these uncertainties are shown as the red 
shaded region. This is formally a 3.2a deviation (although 
we do not necessarily expect the tails of the distribution 
to be Gaussian) that is used to define the not implausible 
region. Note that some regions of the luminosity function 
are easily emulated, and the points lie close to the expected 
value. In other regions, particularly at the bright end of the 
bj luminosity function, there is significantly greater vari- 
ance. This is telling us that it is hard to precisely emulate 
the behaviour of GALFORM in these regions. The uncer- 
tainty is, however, well described by the emulator's predicted 
uncertainty. The ability of the emulator to capture this vari- 
ance is key since it has allowed us to efficiently cut down the 
full parameter space of models. Note that a series of similar 
diagnostics were performed for each emulator at each wave 
of the analysis, and in each case the emulator was found to 
provide the expected levels of accuracy (see Vernon et al. 
2010 for details). 

Since we are confident that the emulator has success- 
fully directed the selection of Wave-5 parameter sets, we con- 
tinue to compare the parameter space of acceptable models 
with that suggested by the emulator. Fig. [T] shows a pairs 



plot of the Wave-5 evaluations. Each dot shows the implau- 
sibility of the model evaluation, colour coded to match the 
implausibility colour scale of Fig. [4] The points have been 
superposed so as to illustrate the most acceptable in each re- 
gion. Acceptable luminosity function fits (green points) are 
found throughout the parameter space, and the figure bears 
striking similarity to the analogous Fig. [4] which was based 
on emulator predictions rather than model evaluations. This 
again confirms that the emulator successfully captures the 
behaviour of the GALFORM model. 

Of the 2000 model evaluations, 113 resulted in accept- 
able models according to the criterion 7^,^ < 2.5. The reduc- 
tion in volume is not surprising since we have tightened the 
implausibility threshold and eliminated the emulator vari- 
ance. Furthermore, many of the evaluations were performed 
for marginal models for which the emulator gave only a small 
chance of an acceptable match (recall that the surface of 
the acceptable parameter space is huge in 10 dimensions). 
Despite this apparent inefficiency, if we wish to make a sys- 
tematic investigation of parameter space it is important that 
marginal models are evaluated. Of course, with only a rel- 
atively small number of evaluations, some parameter space 
projections suffer from considerable shot noise, particularly 
the low "optical depth" regions seen above the diagonal in 
Fig|4] Further evaluations of the model could be used to fill 
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Figure 7. The Wave 5 model runs plotted on a set of 2-d projections. The points are colour coded by the model run implausibility, 
rather than the emulator's predicted implausibility. Note that the model has only been evaluated in the "not implausible" region defined 
by the Wave 4 emulator. Green points resulted in acceptable luminosity functions given the model variance and repeatability. The Bow06 
model is shown as a black point. 



in these regions if required. It is already apparent, however, 
that acceptable fits to the luminosity functions can be found 
throughout the wide range of parameter values suggested by 
the emulator analysis. 

We conclude that the emulator method provides an ac- 
curate scheme for identify model parameter sets that are 
very likely to yield acceptable fits to the luminosity func- 
tions. Before we examine the physical links between the pa- 
rameters in the acceptable regions, we use the 113 acceptable 



runs to examine how well models which make acceptable fits 
to the luminosity functions perform in matching other low 
redshift data sets. 

4.3 Further Constraints from Additional Data 

The main focus of this paper is to investigate the constraints 
derived from the bj and K luminosity functions. This fol- 
lows the methodology of Bow06. However, it is interesting 
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Figure 9. This figure illustrates the comparision of models with additional datasets. Models which give an acceptable match to the 
luminosity function data, and produce fits with < Xcut (^^^ Table |6| for all of the additional datasets considered are shown as green 
lines. The same models are highlighted as green points in Fig. |10[ The Bow06 model is shown as a red line for comparison, and the 
observational data are shown as black points with error bars or upper limits (denoted by triangles). The source of the observational 
data is described in the text. With the exception of the gas mass to luminosity comparison, the panels show the comparison for a single 
magnitude slice. The total values shown are derived by summing the contibutions from several magnitude slices. 



to briefly examine how the model may be constrained by in- 
cluding additional data in the comparison. In order to make 
an initial investigation, we apply these constraints as a sec- 
ond phase, so that we only consider models which survive the 
primary criterion of generating an acceptable match to the 
luminosity functions, and we base the further exploration 
on the 113 fully acceptable models that were identified in 
the previous section. The luminosity functions derived from 
these models are shown in Fig. [8] 

We now outline the additional data-sets that we con- 
sider. Note that the first four physical properties listed be- 
low were already used by Cole et al. (2000) and Baugh et 
al. (2005) in choosing the parameter values in their respec- 
tive versions of GALFORM, though we have here updated 
some of the observational data used. Further details of the 
data sets and our approach to the comparison are given in 
Benson & Bower 2009. 

• disk size: We compare to disk size data from De Jong 



& Lacey (2000). We compute the values in a series of bins 
in both magnitude and size. The ideal model would there- 
fore match not only the sizes of local galaxies, but also the 
spread in size. The predicted disk size distribution depends 
strongly on the angular momentum of accreted gas which 
has a complex dependence on halo growth 

• TF relation Galaxy formation models have tradition- 
ally struggled to match the TuUy-Fisher relation. The nor- 
malisation and slope of the relation depend both on the 
relationship between stellar mass and halo mass, and on the 
baryonic contraction of the halo. We compare with i-band 
data from Pizagno et al. (2007). 

• gas metallicity: This is an important constraint on the 
effectiveness of supernova-driven feedback. We compare with 
data from Tremonti et al. (2004) on the oxygen abundance 
of gas in late- type galaxies in the SDSS. 

• Gas mass to Lb'- The cold gas reservoir is sensitive 
to the rate at which gas is accreted, the rate at which it 
is converted into stars and the effectiveness of supernova- 
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Figure 8. The bj and K luminosity functions of the 113 accept- 
able model evaluations found in Wave 5. The axes, colour scale, 
error bars and lines are the same as Fig. [5] 



data set 




•^Bow06 




Xcut 


disk size 


40 


7160 


443 


10000 


TF relation 


20 


124 


44 


150 


gas metalicity 


4 


21 


1 


27 


gas mass to Lg 


9 


3090 


57 


3500 


SDSS colours 


2400 


16600 


5150 


17000 


BH mass 


8 


5 


3 


60 



Table 6. Additional data sets used to constrain the model fits. 
Column 1: the property considered; column 2: the data bins used 
in the comparison. The of the Bow06 model is given in column 
3. Column 4 gives the value for the best fitting model for each 
separate comparison. In order to selected an interesting set of 
models for further comparison between data sets, we used the 
cut off value listed in column 5. Note that the cut-off values used 
for the disk size and Gas mass to Lb constraints are far in excess 
of those expected for a statistically acceptable model. 



driven feedback. We compare with a compilation of HI data 
from Huchtmeier & Richter (1988) by computing the mean 
and standard deviation of the ratio of the HI gas mass to 
B-band luminosity as a function of B-band magnitude. We 
only consider model galaxies with a bulge to total ratio of 
less than 0.4 and gas mass fraction greater than 3%. 

• SDSS colours: We compare with the overall distribu- 
tion of galaxy g — r colours from Weinmann et al. (2006). 
The vast amount of data available for this test results in a 
large number of bins. As discussed in Font et al 2008, this 



match can be substantially improved by adjusting the stellar 
yield of the model. 

• BH mass: We compare to data from Haering & Rix 
(2004). This is a weak test for our model since although 
the parameter Fbh has little effect on the luminosity func- 
tion output, its value can be adjusted to fine tune the black 
hole mass normalisation. Since the parameter is inactive, we 
could have made this adjustment without having discernible 
effect on the luminosity function. We show the comparison 
for completeness only, and we have not undertaken this fine 
tuning step. 

For each additional data set, we reduce the compar- 
ison to a single value. Except where indicated, this is 
achieved by comparing the binned distributions of model 
and observed galaxies. The values for the Bow06 model 
and an indication of the number of bins used in each test 
is given in Table [6] Some examples of the fits to these ad- 
ditional datasets are shown in Fig. |9] Further examples are 
shown in Benson & Bower (2009). As has been extensively 
discussed, the x^ value gives only a coarse indication of the 
fit of the model to the data, and makes no allowance for 
model discrepancy terms that estimate the level at which 
we expect the model to perform in each test. For this rea- 
son, we focus on the performance of each model relative to 
Bow06. It should be noted however that Bow06 provides a 
poor match to the disk size data, the gas mass to luminosity 
data and SDSS colour data. In the case of the disk size data 
(cf. Gonzalez et al. 2009), although the best fitting models 
significantly improve the match to average sizes, they still 
fail to describe the variation with luminosity well, suggest- 
ing that the treatment of angular momentum in the code 
may need improvement. Bow06 also fails to reproduce the 
normalisation of the gas mass to luminosity relation and 
the colours. However, the better fitting models result in a 
substantial improvement to these comparisons. The colours 
are significantly improved by increasing the yield Pyieid, al- 
though the required value is larger than the best estimates 
based on stellar evolution models (cf. Cole et al. 2000) but 
within the range of their plausible uncertainty (cf.. Font et 
al., 2008, Benson & Bower 2009). We see below that the gas 
mass to luminosity ratio is sensitive to the assumed star for- 
mation efficiency (as was earlier found by Cole et al. 2000). 

In a future application of the emulator method, we will 
apply the full emulator technique to encapsulate and hence 
emulate all of these statistical outputs. This needs to be 
combined with a careful analysis of the underlying statisti- 
cal assumptions and the realistic model discrepancy terms. 
For the moment we aim only to make an indicative com- 
parison, and we will only distinguish models which perform 
comparably well to Bow06 (the cut-off x^ values are given 
in the table). It is important to note that x^ cut-off values 
are not intended to denote a statistically acceptable fit to 
the data. As we will see, combining all these data sets in this 
way already restricts the parameter space substantially. 

The dependence of x^ for one of the data sets (disk 
Each panel shows the x^ value 



10 



sizes) is illustrated in Fig 
as a function of one of the 16 input parameters. The Bow06 
model is highlighted in red, while green points highlight 
models which fall below the cut-off x^ value in all of the 
data sets tested. While a significant trend of x'^(disk size) 
with is apparent, it is hard to discern a pattern in most 
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Figure 10. This plot shows the values of the models compared to the observed sizes of galaxy disks as a function of some of the more 
important parameters. The y-axis ranges from to 30000 in each case, while the model parameters cover the range given in Table[T] The 
Bow06 model is shown as a red filled circle. All the open points shown provide an acceptable match to the bj and K luminosity functions. 
Green points highlight models which have measures similar to or better than Bow06 when compared to all of the observational 
constraints listed in Table [6] All the points shown (filled and open) produce acceptable fits to the bj and K-band luminosity functions. 



of the panels. The lack of dependence may arise because the 
model output is not strongly dependent on a parameter, or 
because the dependence is masked by dependence on the 
other parameters. We describe our approach to systematic 
identification of interesting parameter combinations in i|5] 

We have plotted the parameter dependence of for 
each of the other data sets with similar results, and so do not 
reproduce each of the figures here. However, many properties 
exhibit a strong dep end ence on the parameter e^^, and we 
illustrate this in Fig. 11 The contradictory pressures on e^^ 



are apparent. Looking at the open and filled points together, 
it is clear that a better match to disk sizes is obtained by 
increasing this parameter; however, higher values of e^^ tend 
to worsen the match to the TuUy-Fisher relation and the 
gas mass to luminosity ratios of the galaxies. The models 
highlighted in green are models which pass the cut-ofT in 
all of the data sets as well as provinding an acceptable match 
to the luminosity functions. Thus it is nevertheless possible 
to balance these opposing pressures, picking lower values of 
e^^ and exploiting other parameter dependencies to obtain 
adequate fits to the disk size. We discuss the implication of 
these results for future directions of the model in §6.2[ It is 
notable that the green points all produce better matches to 



the SDSS colour data than Bow06 even though we did not 
enforce this in their selection (of course, we have enforced 
some colour information by requiring that models match 
both the bj and if-band luminosity functions). 



A final point to note from Fig.[To]is the hint of bimodal- 
ity in the distribution of green points in the Orchcat and 
Vhot.disk parameters. The two families of acceptable models 
correspond to models with very strong supernova feedback, 
but a cycling time for the galactic fountain close to the halo 
dynamical time, and models with weaker feedback but with 
cycling times substantially longer than the halo dynamical 
time. Bow06 belongs to the first family of models, while the 
second family is more typical of the earlier Durham models 
(Cole et al. 2000; Baugh et al 2005) and models from the 
Munich group (eg., Croton et al. 2006; De Lucia & Blaizot 
2007). Further model runs, and fuller treatment of the ad- 
ditional datasets are required to explore this point further. 
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Figure 11. The variation in for various data constraints as a function of the parameter. From top left to bottom right, the y-axes 
show the derived from comparison of disk sizes, the TuUy-Fisher relation, gas metalicity, gas mass to luminosity ratio, SDSS colours 
and black hole mass. The colours of data points are the same as Fig. |10| 



5 PROJECTION PURSUIT 

5.1 Luminosity function constraints 

With such a high-dimensional data-set, plotting the depen- 
dence of model runs as a function of one or two input pa- 
rameters conveys only a small fraction of the complexity of 
the underlying parameter space. This bi-variate approach 
assigns a special significance to the input parameters and 
thus, although it is not easily possible to present higher di- 
mensional projections, we can make a more informed choice 
of the projection vectors. In particular we can optimise the 
projection to show (1) the reduced dimensionality of the pa- 
rameter space enforced by the constraints we have applied, 
and (2) maximise the additional leverage of the other data 
sets that we have considered in the previous section. This 
optimisation is often referred to as "projection pursuit" . 

One approach is to project the parameter space of "ac- 
ceptable" models along its principal components. This ef- 
fectively corresponds to rotating the region of acceptable 
models to align with the directions of greatest to least varia- 
tion of parameters. In many situations, principal component 
analysis (PCA) is used to find the directions with greatest 
variance; in our case, we are more interested in the directions 
with least variance. These are the parameter combinations 



that are most tightly constrained by the observational data. 
Using PCA we can align the hyperplanes revealed by our 
parameter space exploration so that their cross-section is 
viewed. It should be stressed, however, that PCA is inher- 
ently linear, and that the projected cross-section may hide 
a much thinner, but warped, relationship between the vari- 
ables. 

One potential problem of PCA is that the input vari- 
ables must be scaled so that the variance along different axes 
can be compared. This is somewhat arbitrary. Our approach 
is to scale the variables by the initial search range, rather 
than restricting it to the range over which fits were found 
to be possible. By restricting our range in this way, we are 
aiming to determine whether the high-dimensionality of the 
acceptable space can be reduced by suitable combination of 
parameters. 

A principal component with low variance implies that 
this particular combination of the parameters is tightly con- 
strained if the model is likely to produce an acceptable lu- 
minosity function fit. Of course, even if this constraint is 
satisfied, a good model is not guaranteed; rather we can 
be confident that if it not satisfied the fit is unlikely to be 
good. When analysing the acceptable region in this way, we 
also need to bear in mind that the PCA assumes that the 
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Var 1 


Var 2 


Var 3 


Var 4 


Var 5 


Var 6 


Var 7 


Var 8 


Var 9 


Var 10 


Hiot,disk 


0.198 


0.112 


-0.231 


-0.283 


0.334 


0.311 


0.0637 


-0.114 


-0.256 


0.724 




-0.386 


0.361 


0.136 


-0.00109 


-0.00943 


-0.0733 


0.0977 


-0.265 


0.713 


0.33 


/stab 


-0.149 


-0.147 


0.247 


-0.312 


-0.0218 


0.331 


-0.104 


0.781 


0.236 


0.103 


^^ot , burst 


-0.115 


-0.167 


0.414 


-0.234 


0.102 


0.173 


0.783 


-0.181 


-0.135 


-0.168 


'nD,mrg 


-0.203 


0.0895 


0.0975 


0.729 


0.0545 


0.627 


-0.0033 


0.00626 


-0.116 


0.0238 


Pyiold 


-0.581 


0.043 


-0.263 


0.0517 


0.632 


-0.3 


0.0585 


0.223 


-0.183 


-0.109 


<i* 


0.214 


0.0194 


0.718 


0.0465 


0.49 


-0.152 


-0.392 


-0.127 


-0.0453 


0.0239 


'^reheat 


0.0702 


0.780 


-0.031 


-0.313 


0.051 


0.273 


-0.0775 


0.0104 


-0.104 


-0.44 


— 1 


0.589 


0.111 


-0.166 


0.297 


0.348 


-0.102 


0.383 


0.302 


0.387 


-0.082 




-0.013 


0.421 


0.27 


0.213 


-0.333 


-0.41 


0.229 


0.341 


-0.379 


0.337 


Range Ratio 


0.598 


0.554 


0.496 


0.469 


0.487 


0.357 


0.343 


0.21 


0.167 


0.0945 


Mean 


0.00341 


0.00371 


-0.265 


0.0368 


-0.244 


0.00854 


-0.132 


0.153 


-0.279 


0.697 


Rel. Std. Dev. 


0.679 


0.65 


0.564 


0.487 


0.418 


0.393 


0.323 


0.224 


0.145 


0.0854 



Table 7. Principal components for the acceptable space of luminosity functions. The columns give the PCA variables ordered by 
decreasing relative standard deviation, where the relative standard deviation is the standard deviation of the component when the 
initial range of variables has been scaled to the range ±1. Small relative standard deviations correspond to components that are tightly 
constrained by the requirement of producing a good luminosity function. Dominant input variables in each of the vectors are highlighted 
in bold font. The variables have been ordered so that the most constrained components appear last. 
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Figure 13. Close up of one PCA projection, showing that the 
acceptable space is not linear, and cannot be accurately described 
by the principal components 

relationships are linear, whereas we have seen that the ac- 
tual acceptable space is curved. This will prevent any of the 
suggested projections being arbitrarily thin and limit the 
accuracy of constraints. 

We begin by focusing on the the luminosity function 
data alone. Fig. [l2] shows the result of applying the PCA 
analysis to the 113 model runs which resulted in acceptable 
matches to the luminosity function. As with previous plots 
derived from luminosity function data alone, we consider 
only the 10 "active" variables identified by the emulator 
analysis. The relationship between the PCA variables and 
the original model parameters is given in TablejT] (see below). 

We have previously seen that, by trading between pa- 
rameters, acceptable solutions can be found over almost the 
full range of the input parameters. The PCA reveals a differ- 
ent story, however. In the plot, the gray region illustrates the 



original parameter space projected onto each pair of PCA 
components. The region is not square because the new coor- 
dinate system is not aligned with the axes of the original hy- 
percube. The blue regions show the projection of models for 
which the luminosity function was acceptable. In this space, 
it is seen that many parameters are strongly constrained 
compared to their initial range. In the projections of the 
three most constrained parameters, the good fits shrink to- 
wards a point. For comparison, we show the Bow06 model 
as a red point. In many projections, it is evident that it lies 
well to the side of the main parameter space. 

We can quantify the degree to which parameters have 
been constrained by comparing the range of the PCA com- 
ponents in the initial parameter space to the range covered 
by the acceptable models. Note that although all the in- 
put variable have been scaled to ±1, the range of the PCA 
components may be greater because the component may be 
aligned with a diagonal of the hypercube. The comparison of 
the ranges is shown in Fig. |14[ The figure quantifies the de- 
gree to which the components are constrained. We consider 
the reduction in range, since the ratio of the standard devia- 
tion is sensitive to the orientation of the original hypercube. 
However, similar results are obtained when we consider the 
ratio of standard deviation of the PCA components. Three 
components stand out in particular, with ranges that are 
less than 1/4 of their initial range. In order to have a ad- 
equate match to the luminosity function these components 
must have quite precisely determined values. However, sim- 
ply matching these values does not guarantee a fit to the 
data, since the other components also need to be consid- 
ered. Indeed, all of the PCA components cover a reduced 
range compared to the initial parameter space. 

It is important to emphasise again that PCA cannot 
extract the full interdependence of the parameters since the 
analysis is intrinsically linear. Fig. [13] illustrates the limita- 
tions. The crescent shape arises because the observational 
constraints generate a nonlinear dependence between the 
components of Var. 2 and Var. 10. The PCA has selected 
these directions to minimise the linear variance. On the other 
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Figure 12. The space of acceptable models that reproduce the local bj and K luminosity functions projected along pairs of principal 
components. Blue points show models which produce luminosity functions with implausibility less than 2.5. Grey points show the 
geometry of the initial allowed parameter space in each of these projections. The principal components are chosen to minimise the 
variance of the models with low values of x^- The highest components (eg., Var 10) are the most constrained. The Bow06 model is shown 
as a red filled circle. This figure illustrates the reduction in the dimensionality of parameter space resulting from requiring a good match 
to a particular LF. 



hand, although the emulator analysis includes terms up to 
3rd order, it is unclear how the physical dependencies can 
be extracted. Thus, despite its limitations, PCA provides a 
good means of gaining insight into the physical processes 
driving the match to the luminosity function. 

The relationship between the PCA variables and the 
original model parameters is given in Table [7] This expresses 
each PCA component as a vector direction in the scaled in- 
put variables. The components have all been normalised to 
unit length in the 10-d space so that a large coefficient indi- 
cates that the PCA component direction is closely aligned 
with the input variable. Without first applying the scaling 
given in Table [l] the relative importance would be obscured. 



We consider the weight of the input variables in the most 
constrained components below: although none of the com- 
ponents are completely aligned with the input parameters, 
most have a dominant component that can be identified with 
one particular input variable. These are highlighted in the 
table. Below we consider the three most constrained PCs, 
focusing on components with contribution greater than 0.3. 

Var. 10 This component is dominated by a 70% contribution 
from Vhot,disk- This parameter controls the amount of mass 
ejected from the galaxy disk to the halo for each solar mass 
of stars formed. However, an adequate match to the luminos- 
ity function may be maintained if an increase in Vhot,disk is 
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Figure 14. The reduction in the range of the PCA components 
from requiring that the model produce a low implausibility lu- 
minosity function. The PCA components are ordered by their 
variance, while the vertical axis shows the ratio of the range of 
acceptable models to the range of the PCA components in the 
initial parameter space. 



offset by an increase in Qreheat (decreasing the timescale for 
the material to become available for reaccretion), decreasing 
Qfcooi (increasing the mass scale at which AGN feedback be- 
comes important) and/or decreasing Ohot (reducing velocity 
scaling of the mass ejected from the disk). 
Var. 9 is dominated by a 70% contribution from Qcooi- In- 
creases in Ocooi can be offset by reducing the star formation 
efficiency (through e*) or increasing the mass dependence of 
the disk feedback (through Ohot)- This PCA component also 
has important contributions from Vhot,disk (with the oppo- 
site sign to the cross term in Var. 10) and the disk stability 
criterion. The physical significance of the Var. 9 becomes 
clearer if we consider the effect of adding a small contri- 
bution from Var. 10 to create a new component that is in- 
dependent of Vliot.diak- This Strengthens the dependence on 
Qcooi, while slightly weakening the dependence on cvhot- We 
explore the idea of optimally rotating principle components 
in i }6^ 

Var. 8 is dominated by the disk stability criterion, edisk, with 
significant contributions from the star formation rate effi- 
ciency, e*, and the mass dependence of feedback efficiency, 

Ohot- 

Looking at the less constrained components, we see 
that these often have dominant input variables too. Var. 6 
and Var. 4 are dominated by fdf (the dynamical friction 
timescale); Var. 5 by the yield (increasing the metal abun- 
dance normalisation); Var. 3 by a* (the star formation 
law exponent); Var. 2 by Qrchcat (the timescale for re- 
incorporating ejected disk gas). Var. 1, the least constrained 
component, has equally strong contributions from the yield 
and the star formation rate efficiency. 

We have ordered the variables in Table [7] so that 
the most constrained dominant parameters appears first. 
This emphasises that the most important parameters are 
Vhot.disk, Ocooi and /stab- It is also notable that Ohot (the 
mass dependence of the feedback) is not dominant in any 



particular component, but makes an important contribution 
to many of them. 



5.2 Important directions from other data 

Having shown that projection pursuit using PCA provides a 
useful way of capturing the geometry of the region produc- 
ing acceptable luminosity functions, we proceed to apply a 
second level of analysis to examine how the introduction of 
additional constraints further restricts the allowed region of 
parameter space. 

In order to perform this analysis, we first express each 
model as a function of the principal components identified by 
the luminosity function constraints, as described in §5.1. We 
then renormalise these components so that their variance is 
equal (and recentre the distribution on the mean) . This has 
the effect of mapping the distribution of acceptable param- 
eter region (according to the luminosity function criterion) 
into a roughly spherical distribution centered on the origin. 
We then further restrict the data set using the additional 
data, keeping only those runs with low x^, and characterise 
this reduced space in terms of its new principle components. 
We then map these vectors back into the space defined by the 
scaled input parameters, providing insight into the physical 
differences between models that satisfy both the luminosity 
function constraints and the additional data and those that 
do not. The coefficients quoted in Table|8]are the coefficients 
of the scaled input variables in the additionally constrained 
direction. The scalings of the original input parameters are 
given m Table [l] 

We find that the analysis reveals significant constraints 
that arise from adding data on the disk sizes, the Mhi to 
Lb ratio, and the TuUy-Fisher relation. The dependence of 
on the most constrained component is shown for each 
of the data sets in Fig. |15[ The importance of these data 
sets was alrea dy apparent because of the clear dependence 
on e^^ in Fig. |ll| however, the PCA is capable of revealing 
constraints that are based on combinations of parameters 
that would not have been evident in a simpler analysis. For 
example, contrast the evident bunching of the low points 
in the first panel (which shows x^(disk size) with absense of 
clear trends in Fig. |10| 

The reduction in the standard deviation due to the PCA 
is shown in Table [8] together with the model's additionally 
constrained directions. It is immediately apparent that the 
directions due to the gas mass to Lb ratio and the TuUy- 
Fisher relation constraints are similar. Not only do both 
have dominant contributions from e^^, but the weighting of 
most other variables are also similar. 

While the disk size constraint also depends strongly on 
^, the overall direction of the constraint is different from 
that implied by the Gas to Lb ratio, and the TuUy- Fisher 
relation, and the constrained direction also depends equally 
strongly on the yield and acooi parameters. Thus, models 
which match all the data sets tend to have low values of 
e^^, selecting particular values for the other parameters to 
compensate for this. This is evident in Fig. |11| and explains 
the trend for the accepted models to be squeezed into the 
lower left corner of the first panel. This tension has also been 
apparent in earlier versions of the GALFORM code (eg. 
Cole et al. 2000); however, the analysis scheme presented 
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Relative std. dev. 


Vhot.disk 


'^cool 


— 1 


/stab 


Pyicld 


a* 


^Edd 


disk size 


0.49 


0.301 


-0.519 


0.468 


-0.119 


-0.449 


-0.269 


-0.021 


Gas mass to Lg 


0.40 


0.16 


-0.251 


0.764 


-0.282 


-0.197 


-0.334 


-0.205 


TF relation 


0.29 


0.264 


-0.33 


0.775 


-0.207 


-0.202 


-0.24 


-0.196 



Table 8. The most additionally constrained directions using various complimentary data sets in addition to the luminosity function 
constraints. For each data set, the table shows the most constrained component vector together with the factor by which the standard 
deviation is reduced. Directions are quoted relative to the normalised parameter range: the scaling is given in Table [T] These components 
provide physical insight into the effect of introducing the additional data sets. Only data sets which resulted in a significant additional 
constraint are shown: the other data sets considered in §4.3 do not result in strong additional linear constraints on parameter combinations. 
Parameters with no contribution greater than 0.2 have been suppressed. Note that the best constrained directions are similar for the 
TuUy-Fisher and gas mass to luminosity data sets. 




Figure 15. We show the values of the models as functions of some of the most additionally constrained principal components 
identified in Table [s] In each panel, we show the most constrained component (x-axis) resulting from requiring the model to match the 
data sets shown on the y-axis. Note that the vector direction represented by the axis is different in each plot (see Table |8]|. The Bow06 
model is shown in red. Green solid points highlight models which gave fits comparable to, or better than, Bow06 in all the tests given 
in Table |6] Note how the chosen projection results results in a bunching of the low models campared to the full set. All the points 
plotted produce an acceptable match to the luminosity function data. 



here provides an objective means of identifying the interplay 
between observational constraints and model parameters. 



6 DISCUSSION 

In this paper, we have set out to explore the parameter space 
of the GALFORM semi-analytic model. The model imple- 
ments the key physical processes that define the formation 
of galaxies, including the hierarchical growth of dark mat- 
ter haloes, gas accretion and cooling, and the feedback ef- 
fects of supernovae and AGN. The model we use contains 16 
parameters that describe these processes. Some parameters 
have strongly constrained, physically plausible values; oth- 
ers represent poorly understood physical processes and are 
only weekly constrained by the observational data used here. 
Bow06 identified a largely successful point in this parameter 
space by matching the local bj and A'-band luminosity func- 
tions. In this paper we set out to find out how unique this 
point is and to explore whether other parameter combina- 
tions could perform equally well (or better) at reproducing 
local galaxy properties. 



6.1 The Model Emulator Technique 

We have adopted the Model Emulator technique to efficiently 
but conservatively search the 16-d parameter space. The 
essence of the approach is to build a statistical predictor 
for model results on the basis of a limited set of model runs. 
This is a Bayesian approach, where we are aiming to quantify 
the information we derive from the runs. We use the statis- 
tical model to identify regions where we are confident that 
an acceptable fit will not be found. Such regions are denoted 
as Hmplausible' and excluded from further consideration. A 
second wave of emulation is then performed to better char- 
acterise the surviving portion of parameter space. After 4 
waves of emulation, we are left with an accurate emulation 
of the model that defines a 'not implausible' region contain- 
ing 0.26% of the original volume. A run within this region 
is not guaranteed to give a statistical match to the obser- 
vational data, and subsequent runs within this region are 
needed to identify acceptable model realisations. 

In this paper, we have focused on matching to the ob- 
served 6,7 and K-hand luminosity functions of galaxies in 
order to parallel the approach used in Bow06. An impor- 
tant issue has been to quantify the observational and model 
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uncertainties. A key concept has been that of model discrep- 
ancy. This term accounts for the expected accuracy of the 
model itself: since the semi-analytic model is inherently an 
approximation to the true physical process, we are willing 
to accept models which come close to the observational data 
but do not match it exactly. This is a key concept in our 
Bayesian approach, and it differs from most previous work 
in which the philosophy is to adapt the model parameters 
to find fits consistent within the observational uncertainties 
alone. We argue that it is important to explicitly account 
for the approximate nature of the model, and that ignoring 
the model discrepancy term will lead to over-zealous exclu- 
sion of some regions of parameter space. In this sense, the 
model discrepancy term makes our approach conservative. 
Naturally, it is potentially of interest to investigate the ef- 
fect of reducing the model discrepancy term in order to guide 
improvements to the model. However, we feel that improve- 
ments in the model are currently better driven by compari- 
son to additional data sets, where the tensions in the model 
can be more easily exposed (as illustrated in §4.3). Formal 
discussion of the process of "model reification" can be found 
in Goldstein & Rougier 2008. 

We have accepted models at the 2.5a level, which 
equates to models that come within a factor of approxi- 
mately 1.54 of the observed luminosity function (ie., a fac- 
tor of lo^-sxo o'^ss, see §3.6). Since the dynamic range of 
the luminosity function covers almost 5 orders of magnitude 
in space density this term appears relatively small in any 
visual comparison, as can be seen from Fig. |8] Acceptable 
luminosity functions are generally very good matches to the 
luminosity function around the knee of the luminosity func- 
tion. It is evident that many models struggle to match the 
very bright part of the bj luminosity function, even though 
they provide a good description of the break in the K-band 
luminosity function. Physically, these models tend to allow 
a small amount of star formation to take place in the most 
massive systems. 

Nevertheless, the model emulator suggests that plausi- 
ble matches to the luminosity functions may be found over a 
large fraction of the parameter ranges. However, this results 
from a fine-tuned interplay between parameters and the ac- 
ceptable models are confined to thin hyperplanes. Out of 
the 16 input variables, we find that 10 need to be taken into 
account in order to make the emulator predictions. These 
are labeled 'A' in Table [l] The remaining parameters (eEdd, 
ydlip ) ./"burst ; -f\>h ; Vcut and Zcut ) have a weak effect on the lu- 
minosity functions (although it may have significant impact 
on other aspects of the model) . Although we have achieved a 
large reduction in parameter space, further model runs need 
to be performed to determine whether each point actually 
returns a statistically acceptable luminosity function. Out of 
a sample of 2000 runs surviving the Wave 4 implausibility 
cut-off, we find 113 runs that provide acceptable fits to the 
luminosity function, implying that only 0.014% of the ini- 
tial input parameter space is compatible with the combined 
observational constraints and model discrepancy terms. 

Although the model emulator technique is gaining 
favour in other areas of scientific modeling, we could have 
applied the Markov Chain Monte Carlo method to the prob- 
lem. This approach has been pursued by Henriques et al. 
2009 and Kampakoglou et al. 2008. However, they consider 
only a small subset of the possible parameters, analysing a 



6d and 7d space respectively, while we consider 16d space. 
We note that while "only" 10 active variables are included 
in our fitting functions, the full parameter space is consid- 
ered by the emulator. The active variables do not need to be 
guessed in advance of the minimsation process, and are all 
varied in all our GALFORM runs. Moreover, the approach 
we adopt is readily adapted as introducing new physical 
processes in the GALFORM model generates even higher 
dimensionality (eg., Baugh et al. 2005; Benson & Bower, 
2009). 

Our galaxy formation model is most comparable to that 
used by Henriques et al. 2009 (which is based on Croton et 
al. 2006). Comparison of the results is, however, complicated 
because several of their chosen parameters do not have di- 
rect equivalents in our code and their analysis does not allow 
for any model discrepancy term. Nevertheless, their results 
for luminosity function constraints are qualitatively similar 
to ours, given the lower dimensionality of their investiga- 
tion. A significant difference is that they find that including 
constraints on the black hole mass - bulge mass correla- 
tion strongly constrains the parameterisation. In contrast, 
we find that this relation adds little constraint. This arises 
from the very different treatment of black hole feedback in 
the two models: with the larger number of parameters that 
we consider, we find that the black hole mass may be ad- 
justed largely independently of the galaxy luminosity func- 
tion. This illustrates the potential danger of including too 
few dimensions in the analysis, since this artificially restricts 
the dimensionality of the problem. As we have stressed, pa- 
rameters should be allowed freedom to vary even if their 
prior distributions are significantly constrained. If the emu- 
lator finds that the prior range has little impact on the model 
output, it will reject the parameter from further considera- 
tion. This is not a qualitative decision that should be made 
at the outset. In future models, we will also allow the cosmo- 
logical parameters to vary within their prior distributions. 
This is difficult if the model is driven by N-body simula- 
tions, but is possible if it is driven by the improved Monte 
Carlo merger tree generators such Parkinson et al. (2008). 
In principle, the MCMC method (eg. Trotta 2008) could be 
applied to higher dimensionality problems, however, it then 
becomes hard to drive convergence of the MCMC chains. 
In the statistical model fitting literature, these scaling prob- 
lems have meant that MCMC is falling out of favour, being 
replaced by the emulator techniques that we have applied 
here, which were specifically designed to deal with high- 
dimensional models (eg. Oakley & O'Hagan 2004, Heitmann 
et al. 2009). As the dimensionality of the GALFORM model 
increases further (eg., Benson & Bower, 2009) the advan- 
tages of the emulator technique become even more relevant. 

Our experience in 16-d space suggests that the tech- 
niques could be extended to even higher dimensions, with 
a relatively modest increase in computing effort. To arrive 
at the emulation presented here, we performed 5500 model 
runs. This was by no means the minimum number of eval- 
uations, and we have tended to be very conservative in the 
strategy adopted. A number of techniques could be used to 
speed up the calculations, for example performing smaller, 
low accuracy runs for the initial waves, increasing the ac- 
curacy as the not implausible region shrinks. An important 
aspect of the emulator technique is that variables are only 
explicitly included in the emulator, when the statistical im- 
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provement in doing so is justified. Thus adding irrelevant 
(and even degenerate) variables does not unduly handicap 
the method. 



6.2 Projection Pursuit 

The emulator technique provides a reliable statistical de- 
scription of the GALFORM model. This allows rapid esti- 
mation of the likely success of different regions of parameter 
space. However, the statistical model does not easily pro- 
vide insight into the physical interactions between parame- 
ters. PGA of the successful runs provides a complimentary 
analysis, allowing us to orient the parameter space so as to 
reveal the most tightly constrained model parameters. We 
find that 3 directions are strongly constrained, correspond- 
ing to hnear combinations of the Vhot,disk, Ocooi, ftrohcat 
and Qfhot. This makes good physical sense: increasing the 
strength of disk feedback needs to be balanced by reducing 
the timescale for reheated gas to fall back to the disk, for 
example. Setting these parameters to an appropriate combi- 
nation is necessary to arriving at good models but it is not 
sufficient, and we stress that all 10 active parameters play a 
significant role. 

As well as providing physical insight, the PGA compo- 
nents provide a means of orienting future parameter space 
exploration. For example, if we were to start from a single 
acceptable model, we can attempt to generate another by 
moving along the PGA hyperplanes. If we select a value for 

Viiot. I disk 5 

Var. 10 (in Table|9]) suggests we adopt a particular 
combination of ficooi, ciroheat and Qhot (plus smaller contri- 
butions from the other parameters). Since Vhot,disk is fixed, 
Var. 9 and Var. 8 provide two additional linear equations 
linking (primarily) Qcooi, and Qhot. By choosing one of 
the 4 variables, we can invert the system of equations and 
arrive at a good "guess" for an acceptable model. For ex- 
ample, we have already suggested that angular momentum 
transport in the model needs to be improved. This would 
result in denser gas disks and thus a change in the disk dy- 
namical time. However, from outside the galaxy this change 
would be broadly like a shift in the effective value of the e^^ 
parameter. 

We can investigate this further by noting that the vari- 
ance of Var. 8, 9 and 10 is similar. Together, they define 
the directions of a 7-dimensional hyperplane within which 
the model is significantly less well constrained, while it is 
strongly constrained in the 3 perpendicular directions. How- 
ever, we can select a new linear combination of these compo- 
nents and define the 7-plane equally welQ In particular, we 
can choose to "diagonalise" the contributions from Vhot.disk, 
Qcooi and /stab, the variables with the largest contributions 
to the PGA components. The result of applying the diago- 
nalisation procedure is to define the new component vectors 
given m Table |9] If we now drop terms with coefficients less 
than 0.2, we can arrive at a simple (but approximate) de- 
scription of how 14iot,disk, cicooi and /stab should vary as a 





Var 8 


Var 9 


Var" 10 


^hot,disk 






1 


f^cool 




1 




/stab 


1 






^'hot, burst 


0.30 


-0.14 


0.13 


'^0,mrg 


0.03 


-0.11 


-0.08 


Pyicld 


0.17 


0.32 


-0.03 


a* 


0.16 


0.0 


-0.05 


"^reheat 


-0.14 


0.26 


-0.47 


— 1 


0.44 


-0.29 


-0.30 


^hot 


0.37 


0.43 


0.62 


Mean 


0.29 


-0.13 


0.99 



Table 9. The 3 most constrained PCA components have been 
combined to diagonalise the constraints on Vhot.diaki ctcool and 
/stab- We use the notation Var" 8 (etc) to emphasise that these 
variables are distinct from those in Table [t] This rotation makes 
the physical dependence of these strongly constrained variables 
evident (see text for details). Coefficients larger than 0.2 have 
been highlighted in bold. Note that these directions are not nor- 
malised. 



function of the other active variables in order to keep within 
the 7-plane. These equations are given in Table |10[ where 
we have translated the scaled variables back to their original 
units in order to make the physical dependence more ap- 
parent. Because we ellimated terms with small coefficients, 
T^o.mrg and Q* do not appear in these expressions. Examin- 
ing the equations gives useful insight into the interplay of 
the parameters. For example, it shows that most successful 
models require very high values for the feedback parameter 
Vhot,disk, close to the maximum value allowed in our analy- 
sis. In order to match the luminosity function with a lower 
value, low values must be chosen for Qrchcat (the timescale 
on which reheated gas fall back into the galaxy must be 
increased) and (the disk star formation rate must be 
reduced) and high values for Shot (the relative strength of 
feedback in low mass galaxies is increased). Similarly, we see 
that high values for Shot imply that Qcooi must be increased 
(increasing the mass at which the radio-mode feedback op- 
erates); however, decreases in both Qrcheat and e^^ have a 
compensating effect on Qcooi. The appropriate choice of Qcooi 
is also dependent on Pyieid, as would be expected from the 
metalicity dependence of the cooling function. 

These trends can be confirmed by looking at Fig. [1] 
however it is evident that the description of the trends in 
terms of a few components loses a great deal of the true com- 
plexity of the underlying parameter space. Thus, while con- 
sideration of the PGA components provides a useful guide 
to how we should remap the parameter space in order to 
capture the best fitting parameter region; it must be remem- 
bered that PGA is fundamentally linear and its description 
is very approximate. The restriction to linear dependencies 
does not apply to the emulator technique. 



^ If the variances had been equal, the PCA components would 
be degenerate and all linear combinations would be equivalent. 
We could visualise a 3-disk lying within the 7-plane. Since the 
variances are not exactly equal, the 3-disk is distorted slightly 
into an ellipsoid. 



6.3 Additional datasets 

In this paper, we have followed the approach of Bow06, pri- 
marily requiring that the model be able to reproduce the 
bj and K luminosity functions at an acceptable level of ac- 
curacy, only examining the other data sets for the models 
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Vhot.disk = 550 + 110a,ehoat + 70f;r^-140ahot (kms"!) 

Qcool = 0.64 + 0.16pyicid + 0.13a,choat - 0.14f-^ +0.22ahot 

/stab = 0.84 - 0.04 t>hot,bui-st - 0.07 eT' - 0.06 Qhot 

Table 10. Equations describing the broad brush interactions between the most constrained parameters. These relations have been 
derived from Table [9] Note that these relations are highly approximate and only capture a small fraction of the behaviour described by 
the model emulator. 



which passed this test. Other published models have often 
included additional observational constraints from the start 
(e.g. Cole et al 2000). Clearly an alternative strategy to 
that followed here would be to emulate the full range of 
data at the outset, perhaps including both high and low 
redshift constraints. We have been initially cautious of this 
approach since the model implausibility must be combined 
across different data sets in a carefully thought-out manner 
and weighting the data from prior knowledge would require 
much consideration. Moreover, if the two data sets are in 
contradiction, this can lead to over-confident exclusion of 
parameter space regions, rather than pointing to a particu- 
lar area in which the model needs improvement. The tension 
between the disk size and Tully-Fisher relation data sets is 
a clear case of this. Applying a strict requirement that the 
model match both data sets greatly restricts the acceptable 
parameter region. The physical cause of this tension is the 
baryonic contraction of the halo, something that is peren- 
nial problem in models of disk galaxies (cf. Cole et al. 2000). 
However, the degree of contraction is strongly dependent on 
the angular momentum distribution of the accreted mate- 
rial and the treatment of contraction in disk instabilities 
(see Benson & Bower 2009). It is therefore questionable if 
we should reject models on the basis of this tension at the 
outset, and an approach of considering data sets separately 
may be preferable. 



We find that the disk size, Tully-Fisher and gas mass to 
luminosity ratio data sets provide the strongest additional 
constraints (ie., in addition to the luminosity function con- 
straints). Qualitatively similar conclusions were reached pre- 
viously by Cole et al. (2000), using an earlier version of the 
GALFORM model, and following the traditional approach 
of perturbing parameters away from their "best fit" values 
and visually inspecting the results. In contrast, the meth- 
ods we develop here are objective and do not rely on visual 
assessment. Although, data on colours and metalicity pro- 
vide additional restrictions, at the level that is considered 
here, they provide little additional leverage over compari- 
son of the luminosity functions. Future model changes that 
improve treatment of angular momentum and baryonic con- 
traction should allow these data sets (and others that we 
have not considered here, such as the X-ray luminosities of 
groups and clusters. Bower et al. 2008) to play a more con- 
sistant role. Inclusion of high-z data sets is also possible; 
however, there is clearly a balance to be struck between us- 
ing all the available data to calibrate the model and holding 
some data sets in reserve to act as a test of the model's 
predictive (or rather "post-dictive" ) power. 



7 CONCLUSIONS 

The GALFORM model is a semi-analytic model of the 
galaxy formation process that has been extensively used to 
understand the formation of galaxies. Different versions of 
GALFORM have used varying prescriptions for some of the 
key physical processes, and been tuned to match different, 
though overlapping, observational datasets (Cole et al. 2000, 
Baugh et al. 2005; Bow06). In particular, the Bow06 ver- 
sion of the model, incorporating AGN feedback, has been 
successful in providing a good description of both the bj 
and A'-band luminosity functions of present-day galaxies 
and the evolution of the /('-band luminosity function with 
redshift. However, the Bow06 model represents one selection 
of parameters from a vast 16-dimensional parameter space 
(which is in turn a subspace of the full GALFORM param- 
eter space), and it is natural to ask how unique the model 
is. In particular, we would like to identify the region of pa- 
rameter space that is constrained in this way, allowing us to 
understand the degeneracies between the input parameters 
and their relative importance. Unfortunately, direct evalu- 
ation of the model with a uniform and sufficiently dense 
covering of the parameter space is not feasible because of 
the high-dimensionality of the parameter space and the rel- 
atively long run-time of the model. This is a common prob- 
lem in many computer-modeling disciplines, such as climate 
change modeling, and there is great interest in developing 
efficient mathematical techniques that optimise the use of 
the available computing resource. 

In this paper, we have used the model emulator tech- 
nique (Craig et al. 1996; Craig et al. 1997; Kennedy & 
O'Hagan 2001) to explore the full 16 dimensional param- 
eter space of the Bow06 galaxy formation model. Rather 
than trying to evaluate the model directly at all points in 
parameter space, we run the model at a sparse sampling of 
points and then construct an emulator of the model that 
allows us to interpolate between these points. A key aspect 
of this construction is that we are not only able to establish 
an expectation value for the model's performance between 
runs, but we are also able to encapsulate the degree of uncer- 
tainty in this estimate. Thus we can thus use the emulator 
to identify regions in which the run outcome is uncertain 
and target additional evaluations there. By proceeding in 
waves of iterations, we focus the model evaluation down on 
a smaller and smaller region in which run outcomes are likely 
to accurately reproduce the observational data. 

Another important aspect of our approach is that we 
introduce the concept of "model discrepancy". This is a 
small additional variance term that is included to explic- 
itly account for the level of approximation inherent in the 
GALFORM model. This term means that model luminosity 
functions which lie within a factor of 1.54 of the observed 
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luminosity functions are deemed acceptable, even if the ob- 
servational errors are much smaller than this. This term en- 
sures that we do not attempt to over fit the model and rule 
out regions of parameter space when this is not justified 
(given the approximate nature of the model). 

The method is shown to be highly effective. We find 
that 0.014% of the input parameter space produces model 
luminosity functions that are acceptable matches to the ob- 
served bj and AT-band luminosity functions from Norberg et 
al. (2002) and Cole et al. (2001) (although, in common with 
previous versions of the code, acceptable models tend to lie 
slighly above the faintest if -band measurements). However, 
we find that although the region of parameter-space is small, 
acceptable fits can be obtained as parameters are adjusted 
over a large fraction of their input range. We show that the 
choice of parameters in the Bow06 model is not unique in 
reproducing the observational data and that other choices of 
parameters perform at least equally well: changes in one pa- 
rameter may be compensated by adjusting several other pa- 
rameters to leave the predicted luminosity functions almost 
unchanged. However, while some parameters play a vital role 
in adjusting the luminosity function to match the observa- 
tional data, others have little effect on this (when adjusted 
within the range explored). Interestingly these inactive pa- 
rameters include the re-ionisation parameters (wcut, ^cut), 
suggesting that these affect only galaxies below the faintest 
luminosities included in our luminosity function, and the 
merger parameters (those arc masked by the dominance of 
disk instabilities in the model and, to some extent, the lack 
of morphological constraints). 

In order to explore the parameter dependencies fur- 
ther, wc have investigated using principal component anal- 
ysis to identify optimal projections of the data. These al- 
low us to identify the hyperplane onto which the data is 
constrained by the luminosity function data. This analysis 
reveals physically interesting interactions between the data, 
although each plane generally includes contributions from a 
large number of parameters. We have briefiy explored how 
these directions can be rotated to reveal simple relations 
between the parameters that must be obeyed in order to 
obtain a well fitting model. For example, we are able to 
quantify the relationship between the feedback parameters 
(Fhot ,disk5 Vhot,bursti CKhot ), the return timescale for reheated 
gas (orchcat) and the star formation parameter (e* ^). How- 
ever, although these relations can provide a useful check on 
physical intuition and understanding, they do not reproduce 
the full non-linear complexity of the model that is captured 
by the model emulator. 

We have also briefiy explored the impact of adding ad- 
ditional data sets to constrain the model by evaluating a 
simple statistic against observational data sets describing 
disk sizes, the TuUy-Fisher relation, gas metallicity, the gas 
mass to light ratio, the galaxy colour distribution and the 
black hole mass - bulge mass correlation. We find that the 
disk sizes, the gas-to-light ratio and the TuUy-Fishor rela- 
tion place the strongest additional constraints on the model. 
Similar conclusions were reached by Cole et al. (2000) using 
an earlier version of GALFORM, who followed the tradi- 
tional approach of manually varying parameters and then 
visually inspecting the results. The great advantage of the 
new approach presented here is that it is automated, objec- 
tive, and reveals the couplings between different parameters 



in their effects on observable quantities. However, we find 
that the direction of constraint from the disk size data is in 
contradiction to the constraints imposed by the gas mass to 
luminosity ratio and Tully-Fisher relation data. The disk size 
data presents a particular challenge for the Bow06 model, 
and the issue has been explored in detail in Gonzalez et al. 
2009. 

The model considered here was deliberately restricted 
to the version of GALFORM used in Bow06. In future pa- 
pers we will apply the same general methods to a broader 
class of GALFORM models, including processes such as 
feedback from galaxy super- winds (Benson et al. 2003a), 
variations in the IMF (Baugh ct al. 2005), ram-prcssure 
stripping (Font et al. 2008), and X-ray emission from hy- 
drostatic haloes (Bower et al. 2008), greatly increasing the 
model parameter space. This leads to a new challenge — 
but it is one that we now have the statistical techniques to 
meet. 
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